Abstract
Microbial competition for trace metals shapes their communities and interactions with humans and plants. Many bacteria scavenge trace metals with metallophores, small molecules that chelate environmental metal ions. Metallophore production may be predicted by genome mining, where genomes are scanned for homologs of known biosynthetic gene clusters (BGCs). However, accurately detecting non-ribosomal peptide (NRP) metallophore biosynthesis requires expert manual inspection, stymieing large-scale investigations. Here, we introduce automated identification of NRP metallophore BGCs through a comprehensive algorithm, implemented in antiSMASH, that detects chelator biosynthesis genes with 97% precision and 78% recall against manual curation. We showcase the utility of the detection algorithm by experimentally characterizing metallophores from several taxa. High-throughput NRP metallophore BGC detection enabled metallophore detection across 69,929 genomes spanning the bacterial kingdom. We predict that 25% of all bacterial non-ribosomal peptide synthetases encode metallophore production and that significant chemical diversity remains undiscovered. A reconstructed evolutionary history of NRP metallophores supports that some chelating groups may predate the Great Oxygenation Event. The inclusion of NRP metallophore detection in antiSMASH will aid non-expert researchers and continue to facilitate large-scale investigations into metallophore biology.
Introduction
Across environments, microbes compete for a scarce pool of trace metals. Many microbes scavenge metal ions with small-molecule chelators called metallophores, which diffuse through the environment and chelate metal ions with high affinity.1,2 A microbe possessing the right membrane transporters will be able to recognize and import a metallophore–metal complex, while other strains are unable to access the chelated metal ions. Thus, the metallophore excreted by one microbe can either support or inhibit growth of a neighboring strain, driving complex community dynamics in marine, freshwater, soil, and host environments.3 The most well studied metallophores are the Fe(III)-binding siderophores, which have found applications in biocontrol, bioremediation, and medicine.4 Two recent studies demonstrated that the disease suppression ability of a rhizosphere microbiome is strongly determined by whether or not the pathogen can use siderophores produced by the community; a microbiome can even encourage pathogen growth when a compatible siderophore is produced.5,6 Compared to siderophores, other metallophore classes are relatively understudied, but they likely play equally important biological roles, as exemplified by recent reports of both commensal and pathogenic bacteria relying on zincophores to effectively colonize human hosts.7,8
Hundreds of unique metallophore structures have been characterized, each with specific chemical properties (e.g., effective pH range, hydrophobicity, and metal selectivity) and biological effects on other microbes (based on membrane transporter compatibility). Experimentally characterizing metallophores can be time-consuming and costly, and thus researchers often use genome mining to predict metallophore production in silico.9 Taxonomy alone is not sufficient to predict what metallophores will be produced by a microbe, as production can vary significantly even within a single species.10 Instead, metallophores must be predicted from each genome based on the presence of biosynthetic gene clusters (BGCs) that encode their biosynthesis. The majority of known metallophores are non-ribosomal peptides (NRPs), a broad class of natural products that also includes many antibiotics, antitumor compounds, and toxins. Specialized chelating moieties bind directly to the metal ion (in the case of siderophores, Fe3+), while other amino acids in the peptide chain give the metallophore the required flexibility for chelation. Nearly all NRP metallophores contain one or more of the substructures shown in Fig. 1A: 2,3-dihydroxybenzoate (catechol, 2,3-DHB), hydroxamates, salicylate, β-hydroxyaspartate (β-OHAsp), β-hydroxyhistidine (β-OHHis), graminine, Dmaq (1,1-dimethyl-3-amino-1,2,3,4- tetrahydro-7,8-dihydroxy-quinoline), and the pyoverdine chromophore. Biosynthetic pathways are known for each of the chelating groups (Fig. 1B), and the presence of a chelator pathway may be used as a marker for metallophore production.

Chelating substructures found in bacterial NRP metallophores and their biosynthetic pathways.
(A) Representative NRP metallophore structures. Nearly all known NRP metallophores contain one or more of the eight labeled chelating groups. Most chelating groups provide bidentate metal chelation, as shown for ferric pyoverdine L48. (B) Chelator biosynthesis pathways that form the basis for the new antiSMASH detection algorithm, as described in the text. The same chelator colors are used in each figure.
Mining genomes for metallophore BGCs has facilitated the discovery of chemically and biologically diverse metallophore systems; however, automated detection tools are still severely lacking.9 The peptidic backbones of NRP metallophores are produced by non-ribosomal peptide synthetases (NRPSs), large multi-domain enzymes that activate and condense amino acids and other substrates in an assembly-line manner.11 In the past two decades, a variety of bioinformatic tools have been developed to identify NRPS BGCs in a genome. One of the most popular is the secondary metabolite prediction platform antiSMASH, which uses a library of profile hidden Markov models (pHMMs) to identify (combinations of) enzyme-coding genes that are indicative of certain classes of specialized metabolite biosynthetic pathways.12,13 For example, antiSMASH identifies an NRPS BGC region by the minimum requirement of a gene containing at least one condensation and one adenylation domain. NRP metallophore BGCs are technically detected by this rule as well; however, NRPSs also produce many other families of compounds, and additional manual annotation has still been required to identify NRP metallophore BGCs specifically. Accordingly, accurate prediction of BGCs encoding siderophores and other metallophores was limited to experts in natural product biosynthesis, and even experts cannot manually curate the thousands of BGCs produced by high-throughput metagenomic or comparative genomic analyses. To date, no global analysis of NRP metallophores has been performed, and thus the prevalence, combinatorics, and taxonomic distribution of different chelating groups are unknown.
Here, we describe the development and application of a high-accuracy antiSMASH-integrated method for the automated detection of NRP metallophore BGCs, using the presence of chelator biosynthesis genes within NRPS BGCs as key markers for predicting metallophore production. The new detection rules were applied to 15,562 representative bacterial genomes, allowing us to take the first census of NRP metallophore production across bacteria. At least 25% of all NRPS clusters in these representative genomes code for the production of metallophores and significant biosynthetic diversity remains undiscovered. We then leveraged our computational analyses to guide characterization of siderophores from multiple bacterial taxa, finding structures that matched our genome-based predictions. By mapping NRP metallophore BGCs from 59,851 genomes to the Genome Taxonomic Database (GTDB) phylogeny, we identified myxobacterial and cyanobacterial metallophores as understudied and reconstructed a possible evolutionary history of the chelating groups.
Results
A chelator-based strategy for detection of NRP metallophore biosynthetic gene clusters
The specialized chelating moieties found in NRP metallophores are rarely found in other natural products, and thus we sought to automate metallophore BGC prediction by searching for genes encoding their biosynthesis. An extensive review of published NRP metallophore structures revealed that nearly all contain one or more of just eight chelator substructures (Fig. 1A). Protein domains responsible for their biosyntheses have been reported (Fig. 1B), and thus pHMMs could be constructed to allow detection of putative chelator biosynthesis genes. Generally, draft pHMMs were built from alignments of known and predicted NRP metallophore biosynthesis genes collected from literature, and cutoffs were manually determined (see Supplemental Discussion 1). The final multiple sequence alignments, pHMMs, and cutoffs are provided in the Supplemental Dataset.
A full description of each biosynthetic pathway detection strategy, including caveats and known limitations, is provided in Supplemental Discussion 1 and briefly summarized here. The profile HMMs implemented within antiSMASH are given in monospaced bold font. The biosynthetic cassette for 2,3-DHB is detected by an isochorismate synthase (EntC) and 2,3-dihydro-2,3- dihydroxybenzoate dehydrogenase (EntA).14 Two salicylate biosynthesis pathways are detected by the presence of either an isochorismate pyruvate-lyase (IPL)15 or a bifunctional salicylate synthase (SalSyn).16 We also included detection of two condensation domain subtypes specific to catecholic and phenolic metallophores: VibH-like enzymes (VibH)17,18 and tandem heterocyclization domains (Cy_tandem).19 Peptidic hydroxamate pathways are detected by an ornithine (Orn) or Lys N-monooxygenase (Orn_monoox or Lys_monoox, respectively).20 We could not accurately detect the vicibactin hydroxylase VbsO using a pHMM,21 and so the characteristic acyl-hydroxyornithine epimerase VbsL is used to detect vicibactin biosynthesis.21 We previously identified three families of siderophore-specific Fe(II)/α-ketoglutarate-dependent enzymes responsible for β-OHAsp (TBH_Asp and IBH_Asp) or β-OHHis (IBH_His).22 Based on the recent discovery of β-OHAsp-containing cyanochelins from cyanobacteria,23 we have now identified two new clades that are putatively metallophore-specific and tentatively named CyanoBH_Asp1 and CyanoBH_Asp2. The diazeniumdiolate-containing graminine may be detected by the presence of the cryptic necessary enzymes GrbD and GrbE.24,25 The quinoline chelator Dmaq is detected by FbnL and FbnM, which initiate Dmaq biosynthesis.26 The chromophore of pyoverdines is detected by the presence of a tyrosinase PvdP and/or an oxidoreductase PvdO.27,28
Several known chelating group pathways are not currently detected. Our detection strategy is limited to clades or combinations of biosynthetic enzymes that are distinct to NRP metallophore pathways. Several chelators are synthesized by the core NRPS and/or polyketide synthase (PKS) machinery and could not be detected without also retrieving many false positives, including NRPS-derived thiazol(id)ine and oxazol(id)ine heterocycles (see pyochelin, Fig. 1A) and PKS-derived 5-alkylsalicylate (e.g. in micocacidin29). We also did not include detection of a pathway currently only reported in fabrubactins that produces two α-hydroxycarboxylate chelating moieties (Fig. 1A, bolded atoms).26 Finally, we have not yet designed detection rules for the recently discovered chelating groups 5-aminosalicylate of pseudonochelin30 or 2-napthoate of ecteinamines;31 however, we expect that their biosyntheses will be amenable to detection by the method used herein (Fig. S1). The NRP metallophore detection algorithm is publicly available in the antiSMASH web server and command line tool (https://antismash.secondarymetabolites.org, version 7 and upwards).
Validating antiSMASH NRP metallophore detection against manually curated BGCs
In order to assess the performance of our NRP metallophore BGC detection strategy, we manually predicted metallophore production among a large set of BGCs. A total of 758 NRPS BGC regions from 330 genera were annotated with default antiSMASH v6.1 and inspected for known markers of metallophore production, including genes encoding transporters, iron reductases, chelator biosynthesis, and known metallophore NRPS domain motifs. We thus manually classified 176 BGC regions (23%) as metallophore BGCs (Supplemental Table 2). The new antiSMASH detection rules were applied to the same BGC regions, resulting in 145 putative metallophore BGC regions (F1 = 0.86; Table 1 and Supplemental Table 2). Nine metallophore BGC regions were only detected by antiSMASH. Upon reinvestigation, four were determined to likely represent genuine metallophore BGC regions missed during manual analysis, leaving only five putative false positives in which seemingly unrelated genes matched the pHMMs (97% precision). Conversely, a total of 40 metallophore BGC regions could only be detected manually (78% recall). In the majority of false negatives, NRP metallophore BGCs were missed because chelator biosynthesis genes, on which the detection strategy is based, were not present in the cluster. In 21 cases, genes encoding catechol, salicylate, or hydroxamate biosynthesis were located elsewhere in the genome. In ten cases, chelator biosynthesis pathways were not found anywhere in the genome; these clusters may be non-functional fragments, rely on exogenous precursors (as seen in equibactin biosynthesis32), or have evolved to use novel chelator biosyntheses. Two of the false negatives encoded the 5-alkylsalicylate PKS that is currently undetectable, as described above. Finally, seven manually assigned NRP metallophore BGC regions had no genes corresponding to known chelator pathways (Supplemental Table 2); if correctly annotated, they may represent novel structural classes. In one particularly promising case, a salicylate pathway appears to have been replaced with a partial menaquinone pathway to produce a putative 1,4-dihydroxy-2-naphthoate chelating group (Supplemental discussion 2).

Summary of NRP metallophore BGC detection, comparing the chelator-based rules newly implemented in antiSMASH, the transporter-based method of Crits-Christoph et al.,41 and a combined either/or ensemble.
a Detection methods were each tested on a set of 758 manually annotated NRPS BGC regions (180 true positives). Full results are given in Supplemental Table 2. b Detection methods were applied to 15,562 NCBI RefSeq representative bacterial genomes. The full results are given in Supplemental Table 3. A region is “complete” if it is not on a contig edge, as determined by antiSMASH. c F1 score is equal to 2×(Precision×Recall)/(Precision+Recall). d Percentages indicate the fraction of NRPS regions that were predicted to encode NRP metallophores.
AntiSMASH outperforms transporter-based detection, although both techniques are complementary
Crits-Christoph et al. found that the presence of transporters could be used to predict siderophore BGCs among other NRPS clusters.33 Specifically, the Pfam families for TonB-dependent receptors, FecCD domains, and periplasmic binding proteins (PF00593, PF01032, and PF01497, respectively) were determined to be highly siderophore-specific, and the authors used the presence of two of the three domains to predict a “siderophore-like” BGC region (metallophores that transport other metals were also coded as siderophores in their dataset). We used a modified version of antiSMASH to detect the three transporter families among the 758 manually annotated NRPS BGC regions (Table 1 and Supplemental Table 2). In total, the transporter-based method detected 108 metallophore clusters (F1 = 0.69), including eight putative false positives (93% precision), and had 80 false negatives (56% recall). One false positive was noted in the manual annotation as a likely “cheater”: while several Bordetella genomes encode the synthesis of a putative graminine-containing metallophore, B. petrii DSM 12804 has retained only the transporter genes alongside a small fragment of the NRPS. In the seven other false positives, BGC regions appeared to coincidentally contain transporter genes in their periphery, as they were not conserved in homologous NRPS clusters. In one case, the triggering genes were part of a putative vitamin B12 import and biosynthesis locus. Combining the two methods in an either/or ensemble approach slightly improved overall performance versus the antiSMASH rules alone, achieving 92% precision, 88% recall, and an F1 score of 0.90 (Table 1).
Charting NRP metallophore biosynthesis across bacteria
The implementation of NRP metallophore BGC detection into antiSMASH allowed us to take the first bacterial census of NRP metallophore biosynthesis. The finalized detection rules were applied to 15,562 representative bacterial genomes from NCBI RefSeq (25 June 2022). In total, 3,264 NRP metallophore BGC regions were detected (Table 1 and Supplemental Table 3), including 54 Type II (non- or semi-modular34) NRPS regions that would otherwise not be detected by antiSMASH, such as BGCs for acinetobactin and brucebactin.35,36 NRP metallophores comprised 16% of all NRPS BGC regions in the genomes. Among complete regions (not located on a contig edge), 21% of NRPS BGC regions were classified as NRP metallophores, compared to just 8.6% of incomplete NRPS regions. This is consistent with previous reports that low-quality, fragmented genomes result in low-quality BGC annotations in antiSMASH.37 The transporter-based approach predicted siderophore activity for 15% of complete NRPS regions, including 463 BGC regions without detectable chelator genes; when the two methods are combined, over 25% of NRPS BGCs are predicted to produce NRP metallophores (Table 1). Only complete NRP metallophore BGC regions detected by antiSMASH were used for downstream analyses.
Frequency and hybridization of NRP metallophore chelating groups
Complete NRP metallophore regions from the representative genomes were categorized by the type(s) of chelator biosynthesis genes detected within (Fig. 2). Hydroxamates and catechols were the most common pathways, present in 44% and 36% of BGC regions, respectively. In contrast, β-OHHis, graminine, and Dmaq biosyntheses were rare in representative genomes, each present in less than 2% of detected regions. About 20% of regions contained genes for at least two pathways and putatively produce a hybrid metallophore. Only 42 BGC regions (1.7%) contained three different chelating groups: each encoded genes for the pyoverdine chromophore, a hydroxamate, and either β-OHAsp or β-OHHis. The proportion of hybrid metallophores is likely higher than estimated here. As described above, some chelating moieties could not be captured by the pHMM-based rules. Furthermore, metallophore biosynthesis may require genes from multiple BGCs. Pyoverdine genes may be located in up to five different loci,38 and all 56 regions with only the pyoverdine chromophore pathway are expected to produce hybrid siderophores. Representative characterized siderophores that contain the chelator combinations in our dataset are shown in Fig. 3.

An upset plot of chelator frequency among 2,489 complete NRP metallophore BGC regions from RefSeq representative genomes.
An additional 38 BGC regions were detected by metallophore-specific NRPS domains (VibH-like or tandem Cy) rather than chelator biosynthesis, and may produce catechol and/or salicylate metallophores using biosyntheses encoded elsewhere in the genome.

BiG-SCAPE similarity network of complete NRP metallophore BGC regions from RefSeq representative genomes.
Numbered square nodes indicate published BGCs, as given in Supplemental Table 1. Select hybrid metallophore BGC nodes are highlighted yellow, and their corresponding structures are shown. Nodes are colored by the type(s) of chelator biosynthesis detected therein. BGC regions colored light gray contain only metallophore-specific NRPS domains (VibH-like or tandem Cy) and may produce catechol and/or salicylate metallophores using biosyntheses encoded elsewhere in the genome. The network was constructed in BiG-SCAPE v1.1.2 using 2,596 BGC regions as input, including 78 reference BGCs, and a distance cutoff of 0.5.
The most widespread NRP metallophore families have likely been found, yet significant diversity remains unexplored
Different species of bacteria can contain highly similar metallophore BGCs. To gauge the biosynthetic diversity of the putative NRP metallophores (and thereby the structural diversity), the complete BGC regions were organized into a sequence similarity network using BiG-SCAPE, which clusters BGCs based on their shared gene content and identity. An additional 75 reported NRP metallophore BGCs were included as reference nodes (Supplemental Table 1), and a distance cutoff of 0.5 was chosen to allow highly similar reference BGCs to form connected components (gene cluster families; GCFs) in the network. The final network, colored and organized by chelator type, is presented in Fig. 3. The majority of BGC regions (57%) clustered with the reference BGCs in just 45 GCFs, suggesting that many of the most widespread NRP metallophore families with known chelating groups already have characterized representatives (Fig. S4). However, 1093 BGC regions did not cluster with any reference BGC, forming 619 separate GCFs in the network (93% of all GCFs). Some of these may encode orphan metallophores previously isolated from unsequenced strains, or be similar to known BGCs that were not included in our non-exhaustive literature search. Nevertheless, significant NRP metallophore structural diversity remains undiscovered, particularly among the 484 BGC regions distinct enough to form isolated nodes in the network.
Chemical identification of genome-predicted siderophores across taxa
To showcase how our large-scale automated genome mining methodology can be used to effectively predict functional metallophore biosynthetic pathways across taxa, we characterized the siderophores of three bacterial strains with genomes containing BGCs that were closely connected to reference BGCs in the BiG-SCAPE network (Fig. 3). Two strains belong to genera that have no reported metallophores: Buttiauxella brennerae DSM 9396 was predicted to produce enterobactin (Fig. 4a), and Terasakiispira papahanaumokuakeensis DSM 29361 was predicted to produce both marinobactin(s) (Fig. 4a) and enantio-pyochelin (Fig. 1a). The third strain, Pseudomonas brassicacearum DSM 13227, was selected because its genome contains a BGC that clustered with the histicorrugatin reference BGC. We predicted that the BGC may encode the biosynthesis of ornicorrugatin (Fig. 4a),39 a previously reported siderophore with no known BGC. A fragmented pyoverdine BGC was also present in the strain’s genome, which was predicted to produce the known siderophore pyoverdine A214 (Fig. 4a).40,41

Identification of siderophores predicted from genome mining.
(A) Chemical structures of marinobactins A-E,42 produced by Terasakiispira papahanaumokuakeensis DSM 29361; enterobactin,43 produced by Buttiauxella brennerae DSM 9396; and pyoverdine A21440 and ornicorrugatin,39 both produced by Pseudomonas brassicacearum DSM 13227. The position and orientation of the fatty acid desaturation in marinobactins B and D was not determined in this work. (B-D) High pressure liquid chromatography / high-resolution mass spectrometry (HPLC-HRMS) total ion chromatograms of culture supernatant extracts, overlaid with extracted ion chromatograms for siderophore features. Additional details and spectra are provided in the Supplemental Methods and Results.
Each strain was grown in low-iron conditions to induce siderophore production, then organic compounds were extracted from the culture supernatants using adsorbant resin prior to spectral analyses by electrospray ionization mass spectrometry (ESI-MS) and ESI-MS/MS; full details are provided in the Supplemental Methods and Results. From B. brennerae, we identified four catecholic compounds (Fig. 4B): the predicted enterobactin (Fig. 4A), as well as the enterobactin fragments 2,3-DHB–Ser (DHBS), (DHBS)2 and linear (DHBS)3. The crude extract of T. papahanaumokuakeensis indeed contained molecular ions consistent with marinobactins A-E (Fig. 4A and C). Tandem ESI-MS/MS yielded expected fragmentation patterns for marinobactins A-D, while the peak at m/z 988.5421, putatively marinobactin E, was low abundance and did not give a clear spectrum. No peaks consistent with enantio-pyochelin (m/z 324.4; Fig. 1a) could be observed. From P. brassicacearum, we identified both siderophores predicted from the BGC analyses: ornicorrugatin and pyoverdine A214 (Fig. 4A and D). Fragmentation patterns closely matched those previously reported.39,40
Thus, our method was able to successfully identify the putative BGC for the orphan siderophore ornicorrugatin and also correctly predict the potential to produce known siderophores by taxa that were not yet studied for their metallophore biosynthetic capacities.
Taxonomic distribution of NRP-Metallophores
We investigated the taxonomic distribution and evolution of NRP siderophore biosynthesis within the bacterial kingdom by applying our antiSMASH detection rules to 59,851 representative bacterial genomes from the Genome Taxonomy Database (GTDB).44 Among these, 4,098 genomes (6.8%) were predicted to contain at least one NRP metallophore BGC. A total of 5,366 BGC regions were detected, representing 14% of all detected NRPS regions. Taxonomic distribution analysis using the GTDB phylogeny highlighted the uneven prevalence of NRP-metallophores across bacterial phyla (Table 2). Proteobacteria and Actinomycetota were overrepresented in the GTDB representatives, together accounting for 89% of all detected NRP metallophore regions. After correcting for the number of representative genomes in each phylum, NRP metallophore BGCs were most abundant in Actinomycetota, with 23% of genomes containing at least one detectable region. Proteobacteria, Cyanobacteria, and Myxococcota each had similar proportions of genomes with NRP metallophore BGCs; however, due to biased coverage in the GTDB database, 49% of the detected BGC regions were from Proteobacteria, compared to only 4% and 1.1% found in Cyanobacteria and Myxococcota. Thus, we expect that further sequencing efforts directed at these two phyla will yield many new NRP metallophore BGCs.

Taxonomic distribution of 4,953 NRP-metallophore BGC regions detected in 59,851 GTDB representative bacterial genomes.
Phylum nomenclature is preserved from GTDB r207. An additional 413 BGC regions with “unknown” taxonomy are not included here. Phyla not listed had zero detected regions.
To map the distribution of NRP-metallophore producers across the bacterial kingdom, we employed Relative Evolutionary Divergence (RED) values, a framework proposed by Parks et al. and utilized within the GTDB.45 Building on this, Gavriilidou et al. defined REDgroups—phylogenetically consistent clusters based on RED values—that provide a standardized framework analogous to genera.46 Unlike traditional genera, which can vary significantly in their evolutionary distances, REDgroups offer greater consistency in evolutionary relationships among their members. This framework allowed us to summarize the data as the average number of NRP-metallophore BGC regions per genome within each group, enabling effective visualization and more equitable comparative analyses of biosynthetic potential across bacterial lineages. By collapsing the GTDB tree to the REDgroup level, we annotated each group with the average number of putative NRP-metallophore clusters (Fig. 5). The analysis revealed that 16% of REDgroups encoded detected NRP-metallophores, and within each REDgroup, the number of NRP-metallophores was relatively consistent (standard deviation: 0.3425). This observation aligns with the findings of Gavriilidou et al., who demonstrated that BGC diversity is consistent at the genus level.46 While most REDgroups with NRP-metallophores averaged one per genome, several REDgroups, particularly within Actinomycetota, Proteobacteria, and Cyanobacteria exhibited higher averages, with some exceeding two per genome. These results reveal lineage-specific patterns in siderophore biosynthesis and highlight the utility of REDgroups as an alternative to traditional taxonomic units.

NRP metallophore biosynthesis across the bacterial kingdom.
Center: The Genome Taxonomy Database (GTDB) phylogenetic tree (version r207), with strains collapsed to the REDgroup level.46 Numbered circles indicate the most parsimonious origins of chelator pathways, as determined by reconciliation with eMPRess.47 The bottom-right legend lists the specific hidden Markov models (pHMMs) associated with each estimated origin. Arrows indicate ancient horizontal gene transfers predicted by eMPRess. Ring A: Phylal divisions. Phyla with detected chelating groups are labeled using nomenclature from GTDB r207. Ring B: Chelator biosynthetic pathways detected in at least one member of each REDgroup. Ring C: Average number of detected NRP metallophore BGC regions per genome for each REDgroup. Annotations were mapped to the phylogenetic tree using iTOL v6.49
Evolution of Gene Families and Phylogenetic Reconciliation to Uncover the Evolutionary History of NRP-Metallophores
To investigate the evolution and origins of NRP-metallophores, we conducted a detailed phylogenetic analysis of each chelator group. Elucidating the evolutionary history of bacterial gene families is complicated by gene duplications, horizontal gene transfers (HGTs), and deletions that cause discordance between the bacterial species phylogeny and each chelator gene phylogeny. To reconcile the trees, we used the software package eMPRess, which infers the most likely series of duplication, HGT, and deletion events (maximum parsimony reconciliation) to reconstruct the evolutionary history of the gene family.47 We first extracted non-fragmented BGC regions from the GTDB representative genomes, then clustered them with BiG-SCAPE to yield 1,108 representative BGCs. From these BGCs, we extracted chelator biosynthesis genes and reconstructed gene trees, which were then compared to the GTDB species tree with eMPRess.47
Estimates for the origins and early HGTs of the chelating groups are presented in the center of Fig. 5. Reconciliation indicates that the most wide-spread chelating groups—catechols, hydroxamates, and salicylates—are among the most ancient. Genes for producing 2,3-DHB may have originated in a common ancestor of Actinobacteriota (ca. 2.7 Ga, according to rough estimates from TimeTree48) and were then transferred stepwise to Proteobacteria and to Firmicutes. Salicylate biosynthesis genes have an estimated origin in a common ancestor of Gammaproteobacteria (ca. 1.9 Ga 48), with early HGT to Cyanobacteria and Actinobacteriota. Hydroxamate NRP metallophores appear to have originated in the common ancestor of Alpha- and Gammaproteobacteria (ca. 2.3 Ga 48) and were transferred into Actinobacteria, while Lys-based hydroxamates evolved in Actinobacteriota. The other chelator groups display a more phylum-specific distribution, with HGT predominantly occurring within the same phylum (see Supplemental Dataset, empress_reconciliations). Dmaq is predicted to be among the oldest chelating groups and may have been produced by the common ancestor of Cyanobacteria (ca. 2.7 Ga 48), while the pyoverdine chromophore, exclusively observed within the order Pseudomonadales, likely represents one of the most recent siderophore biosynthetic pathways (ca. 1.2 Ga 48).
Discussion
Trace metal starvation shapes interactions within microbial communities and between bacteria and the host; therefore, natural and synthetic microbiomes cannot be understood without knowing the metallophore biosynthetic potential of the community. High-throughput biotechnological applications will benefit from in silico metallophore prediction due to the prohibitively high cost of isolation and characterization. To date, distinguishing peptidic metallophore BGCs from other NRPS BGCs has been largely limited to manual expert analyses, leading to blind spots in our understanding of microbes and their communities. We have now automated bacterial NRP metallophore prediction by extending the secondary metabolite prediction platform antiSMASH to detect genes involved in the biosynthesis of metal chelating moieties, enabling the first global analysis of bacterial metallophore biosynthetic diversity.
The presence of genes encoding catechols, hydroxamates, and other chelating groups is one of the most frequently used markers of a metallophore BGC.9 We have formalized and automated the identification of eight chelator pathways, allowing us to detect 78% of NRP metallophore BGCs with a 3% false positive rate against a manually annotated set of NRPS clusters. Biosynthetic genes are detected with custom pHMMs and significance score cutoffs calibrated for accurate metallophore discovery, diminishing the ambiguity of interpreting gene annotations, protein families, and BLAST results. We acknowledge that human biases may have influenced which clusters were coded as putative metallophores during both algorithm development and testing; however, expert manual curation remains the most reliable method for NRP metallophore BGC detection. Unfortunately, 22% of manually identifiable metallophore BGCs could not be automatically distinguished from other NRPS clusters, as the algorithm developed (for the purpose of being easily integrated into antiSMASH) relies on the presence of one or more known chelator biosynthesis genes colocalized with the NRPS genes.
Recently, Crits-Christoph et al. demonstrated the use of transporter families to predict that a BGC encodes siderophore (or metallophore) biosynthesis.33 Among our test dataset, the biosynthesis-based antiSMASH rules outperformed the transporter-based approach (F1 = 0.86 versus F1 = 0.69). However, some putative metallophore BGCs were only found using the transporter-based approach, and a combined either/or ensemble approach slightly outperformed the antiSMASH rules alone (F1 = 0.90). Biosynthetic- and transporter-based techniques are thus complementary, and future work could incorporate transporter genes into antiSMASH metallophore prediction. We note that the reported transporter-based approach uses just three pHMMs from Pfam, while our biosynthetic detection requires many custom pHMMs. An extended set of metallophore-specific transporter pHMMs designed according to the same principles as those followed for the biosynthesis-related pHMMs could significantly improve detection by reducing false positives and capturing other families of transporters. The NRP metallophore BGCs discovered in this study could serve as a dataset for developing a more comprehensive model for metallophore transporter detection.
The diverse enzyme families responsible for the biosynthesis of NRP metallophore chelating groups (Figure 1B) evince that metal chelation has evolved multiple times, and we expect that more NRPS chelator substructures remain undiscovered. In fact, during manuscript preparation, the novel chelator 5-aminosalicylate was reported in the structure of the Pseudonocardia NRP siderophore pseudonochelin,30 and we found several unexplored clades of Fe(II)/α-ketoglutarate- dependent amino acid β-hydroxylases that are likely involved in metallophore biosyntheses (Figure S2). Additionally, we have likely identified a new biosynthetic pathway in the genome of Sporomusa termitida DSM 4440, which encodes a partial menaquinone pathway in place of a salicylate synthase to putatively produce a novel karamomycin-like metallophore (Figure S3).50 The modular nature of the pHMM-based detection rules will allow for new chelating groups to be added as their biosyntheses are experimentally characterized.
Metallophore BGC regions from representative genomes were compared to reference BGCs and organized into gene cluster families (GCFs) with BiG-SCAPE (Figure 3). We found 1093 metallophore BGC regions that were dissimilar from any reference BGC, and almost 500 distinct BGC regions were found in only a single strain. Although significant biosynthetic diversity remains undiscovered, cluster de-replication will become increasingly important to avoid re-isolating known compounds. We also assessed the taxonomic distribution of NRP-metallophore BGC regions by mapping their presence onto a GTDB REDgroup phylogeny. We found that Cyanobacteria and Myxococcota were underrepresented in our analyses due to a relatively low number of published genomes. Considering that only a handful of NRP metallophores have been isolated from these two phyla, we suggest that Cyanobacteria and Myxococcota deserve coordinated efforts of genomic sequencing and experimental work to further characterize their metallophore diversity.
Finally, we used our dataset of detected BGCs and paired taxonomic data from GTDB to investigate the complex evolutionary history of chelating group biosynthesis by reconstructing the most likely origin and major HGT events for each pathway with eMPRess (Fig. 5).47 Catechols, hydroxamates, and salicylates were among the most widespread and ancient chelators with evidence of HGT between phyla. This widespread distribution suggests significant ecological relevance for these chelators in diverse bacterial lineages. Intriguingly, our timeline estimates place the origin of 2,3-DHB and Dmaq prior to the Great Oxygenation Event (∼2.4 - 2.1 Ga), during an era of abundant, soluble ferrous iron. This result leads credence to the hypothesis that chelating molecules first evolved as metal detoxification mechanisms and were repurposed when oxidized iron became scarce.3 Tracing ancient evolutionary events, particularly those involving multiple gene gains and losses, remains challenging due to the exponential increase in complexity as the number of possible events grows. More detailed examinations dedicated to each individual chelating group may yield deeper insights into the complex evolutionary history of these pathways. For example, the origin of hydroxamates must consider the homologous enzymes in NRPS-independent siderophore pathways, and we cannot yet state if metallophore-specific β-OHAsp biosynthesis is polyphyletic due to repeated incorporation into metallophores or a single incorporation followed by repeated transfer into non-chelating roles. Nevertheless, this study represents, to our knowledge, the first attempt at a large-scale phylogenetic analysis into the origin of chelating groups in bacteria.
By integrating chelator detection into antiSMASH, we have taken a major step towards accurate, automated NRP metallophore BGC detection. The new strategy affords a clear practical improvement over manual curation, and has already allowed for the high-throughput identification of thousands of likely NRP metallophore BGC regions, both in this study and in several other recently published analyses that have been enabled by early availability of our methodology.51,52 A future antiSMASH module might predict metallophore activity more accurately with a machine learning algorithm that considers multiple forms of genomic evidence, including the presence of transporter genes, NRPS domain architecture and sequence, metal-responsive regulatory elements, and other markers of metallophore biosynthesis that are still limited to manual inspection.9 In particular, regulatory elements will likely be required to accurately distinguish siderophores, zincophores, and other classes of metallophores. Implementation of NRP metallophore BGC detection into antiSMASH will enable scientists of diverse expertises to identify and quantify NRP metallophore biosynthetic pathways in their bacterial genomes of interest and promote large-scale investigations into the chemistry and biology of metallophores.
Methods
For all software, default parameters were used unless otherwise specified. All python, R, and bash scripts used in this paper, as well as underlying data, is available in the Supplemental Dataset, published to Zenodo: 10.5281/zenodo.16581519.
Profile hidden Markov model construction
Profile hidden Markov models (pHMMs) were built from biosynthetic genes in known metallophore pathways, supplemented with putative BGC genes where required (Figure S1 and Supplemental Dataset, 1_development/). Amino acid sequences were aligned with MUSCLE (v3.8)53 and pHMMs were constructed using hmmbuild (HMMER3).54 pHMMs were tested against the MIBiG database (v2.0)55 and an additional 37 NRP siderophore BGCs from literature (Supplemental Table 1) using hmmsearch (HMMER3). Rough bitscore significance cutoffs were determined for each pHMM. More precise cutoffs were assigned by testing against 28,688 NRPS BGC regions from the antiSMASH database (v3).56 BGC regions containing genes near the rough cutoff were manually inspected to determine if these were likely metallophore BGCs. If no clear bitscore cutoff could be discerned, representative low-scoring putative true hits were added to the pHMM seed alignment. This process was repeated until a precise bitscore cutoff could be determined.
Phylogenetic analysis of Asp and His β-hydroxylases
Adequately-performing pHMMs for Asp and His β-hydroxylase subtypes could not be constructed using the above method. Siderophore β-hydroxylase functional subtypes were previously shown to form distinct phylogenetic clades.22 An expanded phylogenetic analysis was performed to serve as a guide for pHMM construction (Supplemental Dataset, 1_development/hydroxylase_tree/). NRPS BGC regions from the antiSMASH database (v3) were scanned for matches to previously reported β-hydroxylase pHMMs22 and Pfam pHMMs for siderophore-related transporters (PF00593, PF01032, and PF0149733,57) using a modified version of antiSMASH v6.0. β-Hydroxylase genes meeting a relaxed bitscore cutoff of 300 (1070 total) were dereplicated with CD-HIT web server58 and a sequence identity cutoff of 70%, giving 425 representative amino acid sequences. A multiple sequence alignment was created using hmmalign (HMMER3) and the TauD Pfam (PF02668),57 and a maximum-likelihood phylogenetic tree was reconstructed with IQ-TREE (multicore v2.2.0-beta)59 using the WAG+F+I+G4 evolutionary model. The presence of nearby transporters was mapped onto the phylogenetic tree to identify clades or paraphyletic groups putatively involved in siderophore biosynthesis. Sequences in groups corresponding to previously reported TBH_Asp, IBH_Asp, and IBH_His subtypes and the novel putative CyanoBH_Asp1 and CyanoBH_Asp2 subtypes were extracted, and pHMMs were constructed and tested as described above.
Incorporation into antiSMASH
The pHMMs and cutoffs were added to antiSMASH as a single detection rule called “NRP-metallophore” with the following logic:
VibH_like or Cy_tandem or
(cds(Condensation and AMP-binding) and (
(IBH_Asp and not SBH_Asp) or IBH_His or TBH_Asp or
CyanoBH_Asp1 or CyanoBH_Asp2 or
IPL or SalSyn or (EntA and EntC) or
(GrbD and GrbE) or (FbnL and FbnM) or PvdO or PvdP or
(Orn_monoox and not (KtzT or MetRS-like))
Lys_monoox or VbsL))
Manual validation
RefSeq representative bacterial genomes were dereplicated at the genus level using R, randomly selecting one genome for each of the 330 genera determined by GTDB (Supplemental Dataset, 3_manual_testing/). All NRPS BGC regions in the genomes were annotated with antiSMASH v6.1, yielding 758 BGC regions in the final testing dataset (Supplemental Table 2). The antiSMASH output for each BGC was manually inspected for evidence of NRP metallophore production, including genes encoding transporters, iron reductases, chelator biosynthesis, and known metallophore NRPS domain motifs. The same 758 BGC regions were classified as NRP metallophores using the chelator-based strategy described above, as well as the transporter-based strategy of Crits-Christoph et al.33 Genes matching Pfam pHMMs for siderophore-related transporters (PF00593, PF01032, and PF0149733,57) were identified using a modified version of antiSMASH v6.1, and BGC regions were classified as metallophores if two of the three transporter families were present.33 Each putative false positive was re-investigated before performance statistics were calculated, resulting in the reannotation of four BGCs.
BIG-SCAPE clustering
NRP metallophore BGC regions from RefSeq representative genomes (Supplemental Dataset, 2_refseq_reps_results/metallophores_Jun25.tar.gz) were filtered to remove clusters on contig edges. The resulting 2,523 BGC regions, as well as 78 previously reported BGCs (Supplemental Table 2) were clustered using BiG-SCAPE v1.1.2 with the following settings: “--no_classify --mix --cutoffs 0.3 0.4 0.5 --clans-off”. The network (Supplemental Dataset, 6_bigscape/mix_c0.50.network) was imported to Cytoscape for figure preparation.
Phylogenetic mapping
Genome mining was performed on 62,291 GTDB representative genomes (59,851 after filtering; version r207)44 using AntiSMASH v7.0beta,13,44 with the inclusion of the NRP metallophore detection module. The outputs were analysed to identify predicted NRP-metallophore producers and categorized into distinct chelator groups based on predefined detection criteria. A total of 5,366 NRP-metallophores were identified, representing approximately 14% of all detected NRPS regions. To map the distribution patterns of these producers, the results were integrated with the GTDB tree. Due to the size of the tree, visualization tools such as iTOL49 were impractical, prompting dereplication to a higher taxonomic rank. The GTDB tree was collapsed to the REDgroup level—a phylogenetically defined rank analogous to genera—allowing normalization to reflect the average number of NRP-metallophore biosynthetic gene clusters (BGCs) per genome within each REDgroup.46
To uncover the evolutionary history of siderophore biosynthesis, phylogenetic analyses and reconciliation were performed. Gene sequences for each chelator group were extracted from 4,060 complete BGCs, filtered to exclude clusters located on contig edges, and clustered into Gene Cluster Families (GCFs) using BiG-SCAPE60 with a 0.5 cutoff. From each GCF, one representative BGC was selected, resulting in a dataset of 1,108 clusters. Multiple sequence alignments (MSAs) were conducted using MAFFT v7,61 and phylogenetic trees were constructed using FastTree 2 with the WAG model.62 Evolutionary events, including gene duplication, loss, and horizontal gene transfer, were identified using phylogenetic reconciliation in eMPRess47 by comparing gene trees to species trees. Reconciliation results were annotated using iTOL v649 for visualization, manually mapping key evolutionary events onto the GTDB tree. Individual gene tree reconciliations are available in the Supplementary Dataset.
Data availability statement
All python, R, and bash scripts used in this paper, as well as underlying data, is available in the Supplemental Dataset, published to Zenodo: 10.5281/zenodo.16581519. The enterobactin, marinobactin, and ornicorrugatin BGCs have been submitted to the MIBiG repository with accession numbers BGC0003172, BGC0003173, BGC0003174, respectively.
Acknowledgements
This project has received funding from the European Research Council under the European Union’s Horizon 2020 research and innovation programme (Starting Grant 948770-DECIPHER; ZR and MM), as well as from the US National Science Foundation (CHE-2108596; AB). BT and NZ were supported by H2020-FNR-11-2020: SECRETED—Grant agreement: 101000794. NZ was supported by the German Center for Infection Research TTU09.717. This work was supported by the Office of Navy Research Award Number N00014-23-2197. We thank Allegra Aron for providing useful feedback on the manuscript.
Additional files
Additional information
Funding
European Research Council
https://doi.org/10.3030/948770
National Science Foundation (CHE-2108596)
European Research Council (H2020-FNR-11-2020)
German Center for Infection Research (TTU09.717)
Office of Naval Research (N00014-23-2197)
References
- 1.Chemistry and biology of siderophoresNat. Prod. Rep 27:637–657Google Scholar
- 2.Metallophores and Trace Metal BiogeochemistryAquat Geochem 21:159–195Google Scholar
- 3.Bacterial siderophores in community and host interactionsNat. Rev. Microbiol 18:152–163Google Scholar
- 4.Perspective on the biotechnological production of bacterial siderophores and their useAppl Microbiol Biotechnol 106:3985–4004Google Scholar
- 5.Competition for iron drives phytopathogen control by natural rhizosphere microbiomesNat Microbiol 5:1002–1010Google Scholar
- 6.Siderophore-Mediated Interactions Determine the Disease Suppressiveness of Microbial ConsortiamSystems 5:e00811–19Google Scholar
- 7.Siderophore-mediated zinc acquisition enhances enterobacterial colonization of the inflamed gutNat. Commun 12:7016Google Scholar
- 8.Kupyaphores are zinc homeostatic metallophores required for colonization of Mycobacterium tuberculosisProc. Natl. Acad. Sci. U. S. A 119Google Scholar
- 9.Genome mining strategies for metallophore discoveryCurr. Opin. Biotechnol 77Google Scholar
- 10.Chemistry and biology of pyoverdines, Pseudomonas primary siderophoresCurr. Med. Chem 22:165–186Google Scholar
- 11.Nonribosomal Peptide Synthesis-Principles and ProspectsAngew Chem Int Ed Engl 56:3770–3821Google Scholar
- 12.antiSMASH 6.0: improving cluster detection and comparison capabilitiesNucleic Acids Res 49:W29–W35Google Scholar
- 13.antiSMASH 7.0: new and improved predictions for detection, regulation, chemical structures and visualisationNucleic Acids Res 51:W46–W50Google Scholar
- 14.Enterobactin: an archetype for microbial iron transportProc. Natl. Acad. Sci. U. S. A 100:3584–3588Google Scholar
- 15.Structural genes for salicylate biosynthesis from chorismate in Pseudomonas aeruginosaMol. Gen. Genet 249:217–228Google Scholar
- 16.Irp9, encoded by the high-pathogenicity island of Yersinia enterocolitica, is able to convert chorismate into salicylate, the precursor of the siderophore yersiniabactinJ. Bacteriol 185:5648–5653Google Scholar
- 17.The structure of VibH represents nonribosomal peptide synthetase condensation, cyclization and epimerization domainsNat. Struct. Biol 9:522–526Google Scholar
- 18.Precursor-directed biosynthesis of catechol compounds in Acinetobacter bouvetii DSM 14964Chem. Commun 56:12222–12225Google Scholar
- 19.Structural and mutational analysis of the nonribosomal peptide synthetase heterocyclization domain provides insight into catalysisProceedings of the National Academy of Sciences 114:95–100Google Scholar
- 20.Mechanistic and structural studies of the N-hydroxylating flavoprotein monooxygenasesBioorg Chem 39:171–177Google Scholar
- 21.Enzymatic Tailoring of Ornithine in the Biosynthesis of the Rhizobium Cyclic Trihydroxamate Siderophore VicibactinJ. Am. Chem. Soc 131:15317–15329Google Scholar
- 22.Genomic analysis of siderophore β-hydroxylases reveals divergent stereocontrol and expands the condensation domain familyProc. Natl. Acad. Sci. U. S. A 116:19805–19814Google Scholar
- 23.Cyanochelins, an Overlooked Class of Widely Distributed Cyanobacterial Siderophores, Discovered by Silent Gene Cluster AwakeningAppl Environ Microbiol 87:e0312820Google Scholar
- 24.Genomics-Driven Discovery of NO-Donating Diazeniumdiolate Siderophores in Diverse Plant-Associated BacteriaAngew Chem Int Ed Engl 58:13024–13029Google Scholar
- 25.C-Diazeniumdiolate Graminine in the Siderophore Gramibactin Is Photoreactive and Originates from ArginineACS Chem. Biol 17:3140–3147Google Scholar
- 26.Structural and Biosynthetic Analysis of the Fabrubactins, Unusual Siderophores from Agrobacterium fabrum Strain C58ACS Chem. Biol 16:125–135Google Scholar
- 27.PvdP is a tyrosinase that drives maturation of the pyoverdine chromophore in Pseudomonas aeruginosaJ. Bacteriol 196:2681–2690Google Scholar
- 28.PvdO is required for the oxidation of dihydropyoverdine as the last step of fluorophore formation in Pseudomonas fluorescensJ. Biol. Chem 293:2330–2341Google Scholar
- 29.An iterative type I polyketide synthase initiates the biosynthesis of the antimycoplasma agent micacocidinChem. Biol 20:764–771Google Scholar
- 30.Genome Mining and Metabolomics Unveil Pseudonochelin: A Siderophore Containing 5-Aminosalicylate from a Marine-Derived Pseudonocardia sp. BacteriumOrg Lett 24:3998–4002Google Scholar
- 31.Metabolomics and Genomics Enable the Discovery of a New Class of Nonribosomal Peptidic Metallophores from a Marine MicromonosporaJ. Am. Chem. Soc 145:58–69Google Scholar
- 32.A novel streptococcal integrative conjugative element involved in iron acquisitionMol. Microbiol 70:1274–1292Google Scholar
- 33.Transporter genes in biosynthetic gene clusters predict metabolite characteristics and siderophore activityGenome Res 31:239–250Google Scholar
- 34.Type II non-ribosomal peptide synthetase proteins: structure, mechanism, and protein-protein interactionsNat. Prod. Rep 37:355–379Google Scholar
- 35.Identification and transcriptional organization of a gene cluster involved in biosynthesis and transport of acinetobactin, a siderophore produced by Acinetobacter baumannii ATCC 19606TMicrobiology 150:2587–2597Google Scholar
- 36.Brucella abortus strain 2308 produces brucebactin, a highly efficient catecholic siderophoreMicrobiology 148:353–360Google Scholar
- 37.The antiSMASH database, a comprehensive database of microbial secondary metabolite biosynthetic gene clustersNucleic Acids Res 45:D555–D559Google Scholar
- 38.Genomics of secondary metabolite production by Pseudomonas sppNat. Prod. Rep 26:1408–1446Google Scholar
- 39.Ornicorrugatin, a New Siderophore from Pseudomonas fluorescens AF76Zeitschrift für Naturforschung C 63Google Scholar
- 40.Bacterial constituents CXIII structure revision of several pyoverdins produced by plant-growth promoting and plant-deleterious Pseudomonas speciesMonatsh Chem 134:1421–1431Google Scholar
- 41.Pyoverdine and histicorrugatin-mediated iron acquisition in Pseudomonas thivervalensisBiometals https://doi.org/10.1007/s10534-016-9929-1Google Scholar
- 42.Marine amphiphilic siderophores: marinobactin structure, uptake, and microbial partitioningJ. Inorg. Biochem 101:1692–1698Google Scholar
- 43.HPLC separation of enterobactin and linear 2,3-dihydroxybenzoylserine derivatives: a study on mutants of Escherichia coli defective in regulation (fur), esterase (fes) and transport (fepA)Biometals 7:149–154Google Scholar
- 44.GTDB: an ongoing census of bacterial and archaeal diversity through a phylogenetically consistent, rank normalized and complete genome-based taxonomyNucleic Acids Res 50:D785–D794Google Scholar
- 45.A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of lifeNat. Biotechnol 36:996–1004Google Scholar
- 46.Compendium of specialized metabolite biosynthetic diversity encoded in bacterial genomesNat. Microbiol 7:726–735Google Scholar
- 47.eMPRess: a systematic cophylogeny reconciliation toolBioinformatics 37:2481–2482Google Scholar
- 48.The timetree of prokaryotes: New insights into their evolution and speciationMol. Biol. Evol 34:437–446Google Scholar
- 49.Interactive Tree of Life (iTOL) v6: recent updates to the phylogenetic tree display and annotation toolNucleic Acids Res 52:W78–W82Google Scholar
- 50.Karamomycins A-C: 2-Naphthalen-2-yl-thiazoles from Nonomuraea endophyticaJ. Nat. Prod 82:870–877Google Scholar
- 51.Pangenome mining of theStreptomycesgenus redefines their biosynthetic potentialbioRxiv https://doi.org/10.1101/2024.02.20.581055Google Scholar
- 52.A treasure trove of 1034 actinomycete genomesNucleic Acids Res 52:7487–7503Google Scholar
- 53.MUSCLE: multiple sequence alignment with high accuracy and high throughputNucleic Acids Res 32:1792–1797Google Scholar
- 54.Accelerated Profile HMM SearchesPLoS Comput. Biol 7:e1002195Google Scholar
- 55.MIBiG 2.0: a repository for biosynthetic gene clusters of known functionNucleic Acids Res 48:D454–D458Google Scholar
- 56.The antiSMASH database version 3: increased taxonomic coverage and new query features for modular enzymesNucleic Acids Res 49:D639–D643Google Scholar
- 57.The Pfam protein families database in 2019Nucleic Acids Res 47:D427–D432Google Scholar
- 58.CD-HIT Suite: a web server for clustering and comparing biological sequencesBioinformatics 26:680–682Google Scholar
- 59.IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogeniesMol. Biol. Evol 32:268–274Google Scholar
- 60.A computational framework to explore large-scale biosynthetic diversityNat. Chem. Biol 16:60–68Google Scholar
- 61.MAFFT multiple sequence alignment software version 7: improvements in performance and usabilityMol. Biol. Evol 30:772–780Google Scholar
- 62.FastTree 2--approximately maximum-likelihood trees for large alignmentsPLoS One 5:e9490Google Scholar
Article and author information
Author information
Version history
- Preprint posted:
- Sent for peer review:
- Reviewed Preprint version 1:
Cite all versions
You can cite all versions using the DOI https://doi.org/10.7554/eLife.109154. This DOI represents all versions, and will always resolve to the latest one.
Copyright
© 2025, Reitz et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics
- views
- 0
- downloads
- 0
- citations
- 0
Views, downloads and citations are aggregated across all versions of this paper published by eLife.