Automated genome mining predicts structural diversity and taxonomic distribution of peptide metallophores across bacteria
Figures
Chelating substructures found in bacterial NRP metallophores and their biosynthetic pathways.
(A) Representative NRP metallophore structures. Nearly all known NRP metallophores contain one or more of the eight labeled chelating groups. Most chelating groups provide bidentate metal chelation, as shown for ferric pyoverdine L48. (B) Chelator biosynthesis pathways that form the basis for the new antiSMASH detection algorithm, as described in the text. The same chelator colors are used in each figure.
Workflow for developing an NRP-metallophore-specific profile hidden Markov model (pHMM) and significance score cutoff for an enzyme (sub-)family.
(1) Examples are collected from literature, the amino acid sequences are aligned with MUSCLE and (2) a pHMM is constructed with HMMER3. (3) BGCs of known function from MIBiG are scanned for matches to the pHMM to generate a preliminary bitscore cutoff. (4) NRPS BGC regions from the antiSMASH database are scanned for matches to the pHMM and sorted by bitscore. (5) Starting at the bitscore cutoff, BGCs are manually annotated to predict if they encode the biosynthesis of NRP metallophores using features such as genes encoding membrane transport, metal acquisition (ex: ferric reductases), and the biosynthesis of multiple chelating groups. In a properly functioning system, the bitscore cutoff delineates putative true and false positives to accurately detect the NRP-metallophore-related enzymes (bottom right). However, a bitscore cutoff adjustment may be required (bottom middle), or low-scoring true positives may need to be added to the pHMM seed alignment in an iterative process (bottom left).
A maximum-likelihood phylogeny of putative β-hydroxylases found in NRPS BGC regions.
β-Hydroxylase subtypes found in characterized siderophore BGCs are highlighted and labeled; the red SBH_Asp clade consists of non-metallophore phytotoxins (Hider and Kong, 2010). Gray clades indicate possible metallophore β-hydroxylases that currently have no experimentally characterized representative. Amino acid sequences similar to known siderophore β-hydroxylase subtypes were extracted from the antiSMASH database (v3) and dereplicated prior to tree reconstruction. The right-hand bar gives the number of unique siderophore-related transporter families (Kraemer et al., 2015) present in the BGC region containing the β-hydroxylase.
Analysis of a novel putative NRP metallophore BGC from Sporomusa termitida DSM 4440.
(A) Clinker comparison of the S. termitida BGC with homologous loci. The Sporomusa sp. KB1 BGC contains a salicylate synthase gene detectable by the new antiSMASH rules, while S. termitida DSM 4440 instead contains several genes homologous to the menaquinone locus of Desulfitobacterium spp (Kramer et al., 2020). The cluster comparison was generated with clinker v0.0.26. (Soares, 2022). (B) A proposed metallophore biosynthesis pathway encoded by the S. termitida DSM 4440 BGC. 1,4-Dihydroxy-2-naphthoic acid is synthesized from chorismic acid by homologs of MenFDHBE, encoded by SPTER_RS05985-06005, and MenC, encoded elsewhere in the genome (SPTER_RS21050). The C4 phenol is likely methylated by O-methyltransferase SPTER_RS06015. The naphthoic acid moiety of Karamomycin C (inset box), characterized from an unsequenced strain of Nonomuraea endophytica, (Gu et al., 2020a) is predicted to be synthesized by a similar pathway. Five NRPS genes in S. termitida DSM 4440 encode for the biosynthesis of the core structure by the condensation and cyclization of four Cys residues. The C-methyltransferase of SPTER_RS06025 is predicted to be inactive, as observed in the homologous ulbactin pathway (Gu et al., 2020b). The completed scaffold is released by thioesterase SPTER_RS05970, possibly producing the same tricyclic substructure observed in karamomycin C and ulbactin F (inset box). The final predicted structure accounts for the actions of two thiazoline reductases (SPTER_RS05950 and SPTER_RS05980) and a methyltransferase (SPTER_RS05955), although the regiochemistry and timing of these transformations is unclear.
An upset plot of chelator frequency among 2,489 complete NRP metallophore BGC regions from RefSeq representative genomes.
An additional 38 BGC regions were detected by metallophore-specific NRPS domains (VibH-like or tandem Cy) rather than chelator biosynthesis, and may produce catechol and/or salicylate metallophores using biosyntheses encoded elsewhere in the genome.
BiG-SCAPE similarity network of complete NRP metallophore BGC regions from RefSeq representative genomes.
Numbered square nodes indicate published BGCs, as given in Supplementary file 1a. Select hybrid metallophore BGC nodes are highlighted yellow, and their corresponding structures are shown. Nodes are colored by the type(s) of chelator biosynthesis detected therein. BGC regions colored light gray contain only metallophore-specific NRPS domains (VibH-like or tandem Cy) and may produce catechol and/or salicylate metallophores using biosyntheses encoded elsewhere in the genome. The network was constructed in BiG-SCAPE v1.1.2 using 2596 BGC regions as input, including 78 reference BGCs, and a distance cutoff of 0.5.
Similarity network of complete NRP metallophore BGC regions from RefSeq representative genomes.
Nodes are colored blue if they belong to a GCF with a reference BGC, and orange if they are dissimilar from any reference BGC. The BiG-SCAPE network is identical to that in Figure 3 and reference BGC numbering corresponds to Supplementary file 1a.
Identification of siderophores predicted from genome mining.
(A) Chemical structures of marinobactins A–E, (Winkelmann et al., 1994) produced by Terasakiispira papahanaumokuakeensis DSM 29361; enterobactin, (Martinez and Butler, 2007) produced by Buttiauxella brennerae DSM 9396; and pyoverdine A214 (Uría Fernández et al., 2003) and ornicorrugatin, (Matthijs et al., 2008) both produced by Pseudomonas brassicacearum DSM 13227. The position and orientation of the fatty acid desaturation in marinobactins B and D was not determined in this work. (B–D) High pressure liquid chromatography / high-resolution mass spectrometry (HPLC-HRMS) total ion chromatograms of culture supernatant extracts, overlaid with extracted ion chromatograms for siderophore features.
Semi-preparative RP HPLC chromatogram of B. brennerae DSM 9396 crude extract.
The four peaks indicated by the arrows displayed catechol absorbance and were isolated.
UPLC-ESI-MS TIC of the purified catechol compounds from B. brennerae DSM 9396.
From top to bottom, the molecular ions of the compounds were 670.1469 m/z, 688.1614 m/z, 465.1157 m/z, and 242.0647 m/z, which are consistent with enterobactin, the linear 2,3-DHB–Ser trimer, the 2,3-DHB–Ser dimer, and 2,3-DHB–Ser, respectively.
Mass spectra of the purified catechol compounds from B. brennerae DSM 9396.
From top to bottom, the molecular ions of the compounds were 242.0647 m/z, 465.1157 m/z, 688.1614 m/z and 670.1469 m/z which is consistent with 2,3-DHB–Ser, the 2,3-DHB–Ser dimer, the 2,3-DHB–Ser linear trimer, and enterobactin, respectively.
UPLC-ESI-MS of T. papahanaumokuakeensis DSM 29361.
Top: Total ion chromatogram (TIC) of the crude supernatant extract. Peaks with molecular ions consistent with marinobactin A–E are labeled (top). Bottom: mass spectra of the labeled peaks in the TIC. The major peaks present are consistent with m/z value of Marinobactin A–E, 932.4986 m/z, 958.5095 m/z, 960.5315 m/z, 986.5422, and 988.5549 m/z, respectively.
ESI-MS/MS fragmentation of marinobactin A [M+H]+, 932 m/z, from T. papahanaumokuakeensis DSM 29361.
The b fragments 655.3563 m/z and 742.3990 m/z support the assignment of marinobactin A.
ESI-MS/MS fragmentation of marinobactin B [M+H]+, 958 m/z, from T. papahanaumokuakeensis DSM 29361.
The b fragments 681.3822 m/z and 768.4156 m/z support the assignment of marinobactin B.
ESI-MS/MS fragmentation of marinobactin C [M+H]+, 960 m/z, from T. papahanaumokuakeensis DSM 29361.
The b fragments 681.3985 m/z and 768.4298 m/z support the assignment of marinobactin C.
ESI-MS/MS of marinobactin D [M+H]+, 986 m/z, from T. papahanaumokuakeensis DSM 29361.
The b fragments 709.4139 m/z and 796.4473 m/z support the assignment of marinobactin D.
UPLC-MS spectrum of ornicorrugatin siderophore from Pseudomonas brassicacearum DSM 13227.
The singly charged mass of m/z 1012.4 Da [M+H]+ and the doubly charged mass, m/z 506.7 Da [M+2 H]2+, are consistent with the mass of the charged ornicorrugatin siderophore, m/z 1012.4699 Da [M+H]+.
LC-MS tandem MS spectrum of ornicorrugatin siderophore from Pseudomonas brassicacearum DSM 13227.
Average collision energies of (top) 60 eV and (bottom) 35 eV.
UPLC-MS spectrum of pyoverdine siderophore from Pseudomonas brassicacearum DSM 13227.
The singly charged mass of m/z 1134.4 Da [M+H]+ and the doubly charged mass of m/z 567.7 Da [M+2 H]2+ are consistent with the m/z values of the pyoverdine, m/z 1134.4339 Da [M+H]+.
LC-MS tandem MS spectrum of pyoverdine siderophore, pyoverdine A214, from Pseudomonas brassicacearum DSM 13227.
NRP metallophore biosynthesis across the bacterial kingdom.
Center: The Genome Taxonomy Database (GTDB) phylogenetic tree (version r207), with strains collapsed to the REDgroup level (Gavriilidou et al., 2022). Numbered circles indicate the most parsimonious origins of chelator pathways, as determined by reconciliation with eMPRess (Santichaivekin et al., 2021). The bottom-right legend lists the specific hidden Markov models (pHMMs) associated with each estimated origin. Arrows indicate ancient horizontal gene transfers predicted by eMPRess. Ring A: Phylal divisions. Phyla with detected chelating groups are labeled using nomenclature from GTDB r207. Ring B: Chelator biosynthetic pathways detected in at least one member of each REDgroup. Ring C: Average number of detected NRP metallophore BGC regions per genome for each REDgroup. Annotations were mapped to the phylogenetic tree using iTOL v6 (Letunic and Bork, 2024).
Tables
Summary of NRP metallophore BGC detection, comparing the chelator-based rule newly implemented in antiSMASH, the transporter-based method of Crits-Christoph et al., (Matthijs et al., 2016) and a combined either/or ensemble.
| Performance metrics* | Number of NRP metallophore BGC regions detected in representative bacterial genomes† | |||||
|---|---|---|---|---|---|---|
| Precision | Recall | F1 ‡ | Complete NRPS regions n=11,704 | Partial NRPS regions n=8,403 | Total NRPS regions n=20,107 | |
| AntiSMASH rule | 0.97 | 0.78 | 0.86 | 2485 (21%) § | 725 (8.6%) | 3210 (16%) |
| Transporter genes | 0.93 | 0.56 | 0.69 | 1723 (15%) | 376 (4.5%) | 2099 (10%) |
| Either/or ensemble | 0.92 | 0.88 | 0.90 | 2948 (25%) | 855 (10%) | 3803 (19%) |
-
*
Detection methods were each tested on a set of 758 manually annotated NRPS BGC regions (180 true positives). Full results are given in Supplementary file 1b.
-
†
Detection methods were applied to 15,562 NCBI RefSeq representative bacterial genomes. The full results are given in Supplementary file 1c. A region is ‘complete’ if it is not on a contig edge, as determined by antiSMASH. An additional 54 BGC regions were detected as NRP metallophores without meeting the requirements for the antiSMASH NRPS rule.
-
‡
F1 score is equal to 2×(Precision ×Recall)/(Precision +Recall).
-
§
Percentages indicate the fraction of NRPS regions that were predicted to encode NRP metallophores.
Taxonomic distribution of 4953 NRP-metallophore BGC regions detected in 59,851 GTDB representative bacterial genomes.
Phylum nomenclature is preserved from GTDB r207. An additional 413 BGC regions with ‘unknown’ taxonomy are not included here. Phyla not listed had zero detected regions.
| Phylum | Number of detected NRP metallophore BGC regions | Percentage of total detected NRP-met regions | Proportion of genomes with ≥1 NRP-met regions |
|---|---|---|---|
| Proteobacteria | 2439 | 49 | 2042/16536 (12%) |
| Actinomycetota | 1986 | 40 | 1561/6931 (23%) |
| Cyanobacteria | 200 | 4.0 | 176/1318 (13%) |
| Firmicutes_I | 192 | 3.9 | 191/4013 (4.8%) |
| Myxococcota | 55 | 1.1 | 52/418 (12%) |
| Firmicutes | 28 | 0.6 | 28/9026 (0.3%) |
| Chloroflexota | 18 | 0.4 | 14/1317 (1.1%) |
| Nitrospirota | 16 | 0.3 | 15/307 (4.9%) |
| Acidobacteriota | 9 | 0.2 | 9/836 (1.1%) |
| Desulfobacterota | 5 | 0.1 | 5/847 (0.6%) |
| Verrucomicrobiota | 2 | <0.1 | 2/1304 (0.2%) |
| Planctomycetota | 1 | <0.1 | 1/1034 (0.1%) |
| Bdellovibrionota | 1 | <0.1 | 1/248 (0.4%) |
| Gemmatimonadota | 1 | <0.1 | 1/345 (0.3%) |
Additional files
-
Supplementary file 1
Tabular data for reference BGCs, rule validation, RefSeq genome results, and ornicorrugatin MS/MS.
An Excel (xlsx) file containing four sheets: (a) Reference BGCs used in this work. Numbering matches Figure 3. BGCs marked with an asterisk are non-metallophore BGCs that emerged as false positives during rule development. (b) Data for the rule validation. The sheet contains BGC metadata, notes used for manual classification, and individual results from detection rule strategies, as well as summary confusion matrices and performance metrics. (c) Results for applying detection rules to BGCs from NCBI RefSeq representative bacterial genomes, including bitscore values for significant matches to profile HMMs. (d) Selected MS/MS fragment ions observed for ornicorrugatin produced by Pseudomonas brassicacearum DSM 13227. All given fragment ions agree with previously reported masses for ornicorrugatin (Matthijs et al., 2008).
- https://cdn.elifesciences.org/articles/109154/elife-109154-supp1-v1.xlsx
-
MDAR checklist
- https://cdn.elifesciences.org/articles/109154/elife-109154-mdarchecklist1-v1.pdf