Global biogeographic sampling of bacterial secondary metabolism
Figures

Global abundance and comparative distribution of AD/KS sequences.
The global abundance (A and C), sample-to-sample variation (B and D), and geographic distribution (E, F, G, and H) of adenylation domains (AD) and ketosynthase domains (KS) were assessed by pyro-sequencing of amplicons generated using degenerate primers targeting AD and KS domains found in 185 soils/sediments from around the world. (A and C) Global AD (A) or KS (C) domain diversity estimates were obtained by rarefying the global OTU table (de novo clustering at 95%) for AD and KS sequences and calculating the average Chao1 diversity metric at each sampling depth. (B and D) The ecological distance (i.e., Jaccard dissimilarity) between AD (B) or KS (D) domain populations sequenced from each metagenome was determined as a function of the great circle distance between sample collection sites (km). Insets show local relationships (<500 km) in more detail. (E and F) All sample collection sites are shown on each world map and lines are used to connect sample sites that share at least the indicated fraction (3%, 10%) of AD (E) or KS (F) OTUs. (G and H) Biome-specific relationships within domain OTU populations sequenced from geographically proximal samples assessed by Jaccard similarity. Samples were collected from (G) Atlantic forest, saline or cerrado environments or from the (H) New Mexican desert topsoils or hot springs sediments.

Biomedically relevant natural product hotspots and diversity.
Hotspot analysis of natural product biosynthetic diversity to identify samples with a high total proportion of reads corresponding to a natural product family of interest (A and D), the maximum unique OTUs corresponding to a natural product family of interest (B and D), or the estimated sample biodiversity (C and D). In A and B samples are arranged by longitude and hemisphere as is shown in the Sample Key. (A) For each sample, sequence reads assigned by eSNaPD are expressed as a percentage of total reads obtained for that sample. A sample is designated a hotspot if more than one percent (0.01; horizontal line) of its reads map to a specific gene cluster. Fractional observance data for five representative gene clusters or gene cluster families (zorbamycin, oocydin, tiacumicinB, epoxomicin, glycopeptides) that show significant sample dependent difference in read frequency are shown. (B) Hotspots of elevated gene cluster family diversity can be identified by determining the number of unique OTUs occurring in each sample that, by eSNaPD, map to a natural product gene cluster of interest. Sample specific OTU counts for nocardicin, rifamycin, bleomycin, and daptomycin clusters are shown. Samples containing greater than 50% of the maximum observed OTU value are colored and mapped in (C). OTU diversity measurements do not predict the abundance of a specific cluster in a metagenome [as predicted in (A)], but instead are used to identify locations where the largest number of congener-encoding clusters may be found. These sites are predicted to be most useful for increasing the structural diversity and therefore potential clinical utility of these medically important families of natural products. (C) Estimated diversity of AD/KS reads by sample. AD and KS OTU tables were combined and for each sample the Chao1 diversity metric was calculated at 5000 reads, providing a baseline metric for comparing sample biosynthetic diversity. The average number of unique OTUs observed over 10 rarefactions analyses is shown (also see Supplementary file 7). (D) Hotspot map of samples identified in A, B and C. (E) Representative structures of target molecule families highlighted in A and B.
Additional files
-
Supplementary file 1
Sample Location Data.
- https://doi.org/10.7554/eLife.05048.005
-
Supplementary file 2
Sample Read and 95% OTU Count.
- https://doi.org/10.7554/eLife.05048.006
-
Supplementary file 3
Adenylation Domain Rarefaction Data (Figure 1A).
- https://doi.org/10.7554/eLife.05048.007
-
Supplementary file 4
Ketosynthase Domain Rarefaction Data (Figure 1C).
- https://doi.org/10.7554/eLife.05048.008
-
Supplementary file 5
Pairwise Sample Distances. Great Circle Distance and Jaccard Distance for AD and KS Amplicons.
- https://doi.org/10.7554/eLife.05048.009
-
Supplementary file 6
eSNaPD Hits Broken Down by Sample and Molecule.
- https://doi.org/10.7554/eLife.05048.010
-
Supplementary file 7
Per Sample Chao1 Biodiversity Estimates at a Rarefaction Depth of 5000 Reads.
- https://doi.org/10.7554/eLife.05048.011