Recent bacterial (meta)genome sequencing efforts suggest the existence of an enormous untapped reservoir of natural-product-encoding biosynthetic gene clusters in the environment. Here we use the pyro-sequencing of PCR amplicons derived from both nonribosomal peptide adenylation domains and polyketide ketosynthase domains to compare biosynthetic diversity in soil microbiomes from around the globe. We see large differences in domain populations from all except the most proximal and biome-similar samples, suggesting that most microbiomes will encode largely distinct collections of bacterial secondary metabolites. Our data indicate a correlation between two factors, geographic distance and biome-type, and the biosynthetic diversity found in soil environments. By assigning reads to known gene clusters we identify hotspots of biomedically relevant biosynthetic diversity. These observations not only provide new insights into the natural world, they also provide a road map for guiding future natural products discovery efforts.https://doi.org/10.7554/eLife.05048.001
Many of the most useful medicinal drugs—including antibiotics and cancer drugs—are derived from bacteria living in the soil that produce these chemicals as part of their natural life cycle. Many of these chemicals have been found by culturing bacteria in the laboratory, but this approach is limited because it only provides access to the chemicals produced by the small fraction of bacteria species that we can culture in this way. Also, many bacteria do not produce as many different chemicals when they are grown under these artificial conditions, instead of their natural environment. This suggests that bacteria living in the environment are likely to provide an additional source of new chemicals that could have medicinal benefits.
Here, Charlop-Powers et al. tackle this issue by employing a high-throughput genetic method for assessing the potential of soil-dwelling bacteria to make compounds with biological activity. They extracted DNA directly from soil samples collected from five continents, in part through the efforts of a citizen-science project called ‘Drugs from Dirt’ (drugsfromdirt.org). These samples came from many different environments, including rainforests, deserts, and coastal sediments.
After extracting the DNA from the soil samples, Charlop-Powers et al. focused on sequencing the genes that encode enzymes called NRPS and PKS. These enzymes are involved in the production of a range of diverse compounds, including many clinically useful antibiotics. By comparing the sequences of the genes found in the different soils, it was possible to estimate how common the genes were in each sample, and also to compare the collections of genes found in different soil types. This comparison revealed that the DNA sequences of the genes encoding NRPS and PKS vary widely among the soil samples, except for samples that came from similar environments in close proximity to each other.
These findings show that populations of soil-dwelling bacteria living in different locations are likely to produce related, but different and largely unexplored, natural compounds that could have the potential to be used in drug therapies or in other industries.https://doi.org/10.7554/eLife.05048.002
Soil-dwelling bacteria produce many of the most important members of our pharmacy, including the majority of our antibiotics as well as many of the cytotoxic compounds used in the treatment of cancers (Cragg and Newman, 2013). The traditional approach for characterizing the biosynthetic potential of environmental bacteria has been to examine metabolites produced by bacteria grown in monoculture in the lab. However, it is now clear that this simple approach has provided access to only a small fraction of the global microbiome's biosynthetic potential (Rappe and Giovannoni, 2003; Gilbert and Dupont, 2011; Rajendhran and Gunasekaran, 2011). In most environments uncultured bacteria outnumber their cultured counterparts by more than two orders of magnitude, and among the small fraction of bacteria that has been cultured (Torsvik et al., 1990, 1998), only a small subset of gene clusters found in these organisms is generally expressed in common fermentation broths (Bentley et al., 2002; Ikeda et al., 2003). The direct extraction and subsequent sequencing of DNA from environmental samples using metagenomic methods provides a means of seeing this ‘biosynthetic dark matter’ for the first time. Unfortunately, the genomic complexity of most metagenomes limits the use of the shotgun-sequencing and assembly approaches (Iverson et al., 2012; Howe et al., 2014) that are now routinely used to study individual microbial genomes (Donadio et al., 2007; Cimermancic et al., 2014). Although bacterial natural products represent an amazing diversity of chemical structures, the majority of bacterial secondary metabolites, including most clinically useful microbial metabolites, arise from a very small number of common biosynthetic themes (e.g., polyketides, ribosomal peptides, non-ribosomal peptides, terpenes, etc) (Dewick, 2002). Because of the functional conservation of enzymes used by these common systems, degenerate primers targeting the most common biosynthetic domains provide a means to broadly study gene cluster diversity in the uncultured majority in a way similar to what is now regularly done for bacterial species diversity using 16S rRNA gene sequences. Here we use this approach to conduct the first global examination of non-ribosomal peptide synthetase (NRPS) adenylation domain (AD) and polyketide synthase (PKS) ketosynthase (KS) domain biosynthetic diversity in soil environments. We chose to explore NRPS and PKS biosynthesis because the highly modular nature of these biosynthetic systems has provided a template for the production of a wide variety of gene clusters that give rise to a correspondingly diverse chemical repertoire, including many of the most clinically useful microbial metabolites (Cragg and Newman, 2013).
With the help of a citizen science effort (www.drugsfromdirt.org), soil samples were collected from five continents (North America, South America, Africa, Asia, Australia) and several oceanic islands (Hawaii, Dominican Republic), covering biomes that include multiple rainforests, temperate forests, deserts and coastal sediments. DNA was extracted directly from these soils as previously described (Brady, 2007) and 96 samples were chosen for analysis of NRPS/PKS diversity using 454 pyro-sequencing of AD and KS domain PCR amplicons. Samples were chosen on the basis of DNA quality and biome diversity; raw sequence reads from these samples were combined with existing amplicon datasets derived from other biomes using the same DNA isolation, PCR and sequencing protocols (Charlop-Powers et al., 2014). The entire dataset representing 185 biomes was clustered into operational taxonomic units (OTUs) at a sequence distance of 5%. Despite millions of unique sequencing reads yielding a predicted Chao1 OTU estimate of greater than 350,000 for each domain, rarefaction analysis suggests that we have not yet saturated the sequence space of either domain (Figure 1A,C).
The first question we sought to address with this data was how biosynthetic sequence composition varies by geographic distance. To do this we calculated the pairwise Jaccard distances between AD/KS sequence sets derived from each sampling site and used these metrics to compare samples. The Jaccard distance, a widely used metric for comparing the fraction of shared OTUs between samples, was chosen over alternative metrics due to its simplicity and to the lack of a comprehensive reference phylogenetic tree for AD and KS domains as exists for 16S analyses. Most Jaccard distances were found to be quite small (<3%), indicating large differences in secondary metabolite gene sequence composition between almost all sample collection sites (Figure 1B,D). Although the OTU overlap between our individual experimental samples is generally small, these relationships allow us to begin to develop a picture of how biosynthetic diversity varies globally. On a global level, the strongest biosynthetic sequence composition relationships are seen between samples collected in close physical proximity to one another (Figure 1B,D,E,F) as opposed to between samples from similar biomes in different geographic locations. For example, at a cutoff of even as low as 3% shared KS or AD OTUs, essentially all inter-sample relationships are observed between immediate geographic neighbors and not similar biomes in different global locations (Figure 1E,F). This likely explains the limited inter-sample relationships we observe between samples from the Eastern hemisphere as most samples from this part of the world were collected from sites at a significant geographic distance from one another. The only exception is the set of soil samples from South Africa, of which a number were collected in relatively close geographic proximity. These samples exhibit similar pairwise Jaccard metrics to those observed between geographically proximal samples collected in the Western hemisphere (Figure 1E,F).
Although differences in biosynthetic composition of microbiomes appear to depend at least in part on the geographic distance between samples, our data suggests that change in the biome type is an important additional factor for the differentiation of biosynthetic diversity on a more local level (Figure 1G,H). For example, at a cutoff of 3% shared OTUs, essentially all inter-sample relationships are observed between immediate geographic neighbors; when this is raised to 10% shared OTUs (Figure 1E,F), relationships are only seen between nearby samples belonging to the same biome. This phenomenon is highlighted by the two examples shown in Figure 1G,H. In the first example, Brazilian soils were collected from Atlantic rainforest, saline or cerrado (savanna-like) sites located only a few miles from one another. Our AD and KS data show these sample are (i) distinct from other globally distributed samples, (ii) most strongly related to the samples from the same Brazilian biome and (iii) only distantly related to the samples from other Brazilian biomes. In the second example, a sample collected from a New Mexican hot spring where the soil is heated continuously by subterranean water is compared with samples derived from the dry soils of the surrounding environment. Once again our amplicon data show that these samples are (i) distinct from other globally distributed samples, (ii) most strongly related to other samples from the same biome and (iii) only distantly related to samples from other nearby biomes. Although it is possible that at a much greater sampling depth all AD and KS domains will be found at all sites as predicted by Baas-Becking's ‘everything is everywhere but the environment selects’ hypothesis of global microbial distribution (O'Malley, 2007; de Wit and Bouvier, 2006), our PCR-based data suggest that both geography and ecology play a role in determining the major biosynthetic components of a microbiome.
The vast majority of AD and KS domain sequences coming from environmental DNA (eDNA) are only distantly related to functionally characterized NRP/PK gene clusters, precluding precise predictions about the specific natural products encoded by the gene clusters from which most amplicons arise. However, in cases where eDNA sequence tags show high sequence similarity to domains found in functionally characterized gene clusters, this information can be used to predict the presence of specific gene cluster families within a specific microbiome. This type of phylogenetic analysis is the basis of the recently developed eSNaPD program, a BLAST-based algorithm for classifying the gene cluster families that are associated with eDNA-derived sequence tags (Owen et al., 2013; Reddy et al., 2014). When an eDNA sequence tag clades with, but is not identical to, a reference sequence in an eSNaPD-type analysis, it is considered to be indicative of the presence of a gene cluster that encodes a congener (i.e., a derivative) of the metabolite encoded by the reference cluster.
Interestingly, eSNaPD analysis of the data from all sites reveals two distinct types of biomedically relevant natural product gene cluster ‘hot spots’ within our data (Figure 2A,B,D). These include ‘specific gene cluster hotspots’ and ‘gene cluster family hotspots’. Metagenomes from ‘specific gene cluster hotspots’ are predicted to be enriched for a gene cluster that encodes a congener of the target natural product, while metagenomes from ‘gene cluster family hotspots’ are predicted to encode multiple congeners related to the target natural product. Figure 2A shows several of the strongest examples of ‘specific gene cluster hotspots’ where reads falling into an OTU related to a specific biomedically relevant gene cluster or gene cluster family are disproportionately represented in the sequence data from individual microbiomes. These examples highlight the different enrichment patterns that we observe in the environment—hotspots are either local in nature, consisting of only one or two samples containing sequence reads mapping to the target (epoxomycin, oocydin); regional (tiacumicinB); or global with punctuated increases in diversity (glycopeptides). We would predict ‘specific gene cluster hotspots’ (Figure 2D) are naturally enriched for bacteria that encode congeners of the biomedically relevant target metabolites, thereby potentially simplifying the discovery of new congeners. Figure 2B shows examples of ‘gene cluster family hotspots’, where metagenomes having a disproportionately high number of OTUs mapping to a specific biomedically relevant target molecule family (e.g., nocardicin, rifamycin, bleomycin, and daptomycin families are shown) are highlighted. This analysis identifies specific sample sites, from among those surveyed, that are predicted to contain the most diverse collection of gene clusters associated with a target molecule of interest (Figure 2B). Both types of hotspots should represent productive starting points for future natural product discovery efforts aimed at expanding the structural diversity and potential utility of specific biomedically relevant natural product families.
Biosynthetic domain sequence tag data are not only useful for pinpointing environments that are rich in specific biosynthetic targets of interest but also as a metric for natural product biosynthetic diversity in general. As only a small fraction (5–10%) of total AD and KS sequences can be confidently assigned by the eSNaPD algorithm, samples showing the largest collection of unique OTUs (at a common sequencing depth) might be expected to contain the most diverse collection of novel biosynthetic gene clusters (Figure 2C) and therefore be the most productive sites to target for future novel molecule discovery efforts. Once normalized for sequencing depth, the number of unique KS and AD sequence tags observed per collection site differs by almost an order of magnitude between environments (Figure 2C), with the most diverse samples mapping to Atlantic forest and Desert environments (Figure 2C,D teal spots, Supplementary file 7).
The development of cost effective high-throughput DNA sequencing methodologies and powerful biosynthesis focused bioinformatics algorithms allow for the direct interrogation and systematic mapping of global microbial biosynthetic diversity. Our analyses of 100s of distinct soil microbiomes suggests that geographic distance and local environment play important roles in the sample-to-sample differences we detected in biosynthetic gene populations. As variations in biosynthetic gene content are expected to correlate with variations in the small-molecule producing capabilities of a microbiome, the broader implication of these observations from a drug discovery perspective is that the dominant biosynthetic systems of geographically distinct soil microbiomes are expected to encode orthogonal, largely unexplored collections of natural products. Taken together, our biosynthetic domain hotspot and OTU diversity analyses represent a starting point in the creation of a global natural products atlas that will use sequence data to guide natural product discovery in the future. Based on the historical success of natural products as therapeutics, microbial ‘biosynthetic dark matter’ is likely to hold enormous biomedical potential. The key will be learning how to harvest molecules encoded by the biosynthetic diversity we are now able to find through sequencing.
Soil from the top 6 inches of earth was collected at unique locations in the continental United States, China, Brazil, Alaska, Hawaii, Costa Rica (Brady and Clardy, 2004), Ecuador, the Dominican Republic, Australia, Tanzania and South Africa. The full sample table is available in Supplementary file1.
To reduce the potential for cross contamination, DNA was extracted from soil using a simplified version of our previously published DNA isolation protocol (Brady, 2007; Reddy et al., 2012). The modified protocol was as follows: 250 grams of each soil sample was incubated at 70°C in 150 ml of lysis buffer (2% sodium dodecyl sulfate [wt/vol], 100 mM Tris–HCl, 100 mM EDTA, 1.5 M NaCl, 1% cetyl trimethyl-ammonium bromide [wt/vol]) for 2 hr. Large particulates were then removed by centrifugation (4000×g, 30 min), and crude eDNA was precipitated from the resulting supernatant with the addition of 0.6 vol of isopropyl alcohol. Precipitated DNA was collected by centrifugation (4000×g, 30 min), washed with 70% ethanol and resuspended in a minimum volume of TE (10 mM Tris, 1 mM EDTA [pH 8]). Crude environmental DNA was passed through two rounds of column purification using the PowerClean system (MO BIO, Carlsbad, California). Purified environmental DNA was then diluted to 30 ng/µl and archived for use in PCR reactions.
Degenerate primers targeting conserved regions of AD [A3F (5′-GCSTACSYSATSTACACSTCSGG) and A7R (5′-SASGTCVCCSGTSCGGTA) (Ayuso-Sacido and Genilloud, 2005)] and KS [degKS2F.i (5′-GCIATGGAYCCICARCARMGIVT) and degKS2R.i (5′-GTICCIGTICCRTGISCYTCIAC) (Schirmer et al., 2005)] domains were used to amplify gene fragments from crude eDNA. Forward primers were designed to contain a 454 sequencing primer (CGTATCGCCTCCCTCGCGCCATCAG) followed by a unique 8 bp barcode that allowed simultaneous sequencing of up to 96 different AD- or KS- samples in a single GS-FLX Titanium region. PCR reaction consisted of 25 µl of FailSafe PCR Buffer G (Epicentre, Madison, Wisconsin), 1 µl recombinant Taq Polymerase (Bulldog Bio, Portsmouth, New Hampshire), 1.25 µl of each primer (100 mM), 14.5 µl of water and 6.5 µl of purified eDNA. PCR conditions for AD domain primers were as follows: 95°C for 4 min followed by 40 cycles of 94°C for 0.5 min, 67.5°C for 0.5 min, 72°C for 1 min and finally 72°C for 5 min. PCR conditions for KS domain primers were as follows: 95°C for 4 min followed by 40 cycles of 54°C for 40 s, 56.3°C for 40 s, 72°C for 75 s and finally 72°C for 5 min. PCR reactions were examined by 2% agarose gel electrophoresis to determine the concentration and purity of each amplicon. Amplicons were pooled in equal molar ratios, gel purified using the Invitrogen eGel system and DNA of the appropriate size was recovered using Agencourt Ampure XP beads (Beckman Coulter, Brea, California). Amplicons were sequenced using the 454 GS-FLX Titanium platform. Raw flowgram files from 454's shotgun processing routine were used for downstream analysis.
Raw reads were assigned to samples using the unique primer barcodes and filtered by quality (50 bp rolling window PHRED cutoff of 20) using Qiime (version 1.6) (Caporaso et al., 2010). USEARCH (version 7), which implements the improved UPARSE clustering algorithm (Edgar, 2013), was used to remove Chimeric sequences with the default 1.9 value of the de novo chimera detection tool. UPARSE clustering requires all sequences to be of the same length. In an effort to balance read quality and abundance with the ability to phylogenetically discriminate gene clusters we used 419 bp as our read length cutoff. The trimmed fasta file was then clustered to 5% to compensate for sequencing error and natural polymorphism that is often observed in gene clusters found in natural bacterial populations. Clustering proceeded as per the USEARCH manual by clustering at a distance of 3% and using representative sequences from each cluster to cluster again at 5%. The resulting ‘5%’ AD and KS OTU tables were used for all subsequent rarefaction and diversity analyses. Read and OTU counts available in Supplementary file 2.
To assess global AD and KS diversity in our sample set we sought to assess the global number of AD and KS domains we might expect to see if all of our data had been generated from a single sample. To do this, all reads assigned to an OTU were consolidated to generate a single-column OTU table where each row contains the sum of all sequences assigned to that OTU from any of the 185 samples. To assess the global diversity we subsampled this table at multiple depths using Qiime (Caporaso et al., 2010) and used the Chao1 formula to estimate the expected number of OTUs at this depth. This rarefaction analysis was performed ten times at each subsampling depth (Figure 1A,C; Supplementary files 3, 4) and the curves were fit to the data using the following equation: y = 1 + log(x) + log(x^2) + log(x^3) where x is the read value and y is the Chao1 diversity.
Ecological distances are calculated using the Jaccard [1 − (OTUA&B)/(OTUA + OTUB − OTUA&B)] or inverse Jaccard metric (Oksanen et al., 2013) and geographic distances were calculated using great circle (spherical) distance derived from the latitude/longitude values of each set of points (Bivand and Pebesma, 2005) (Supplementary file 5). Pairwise ecological and geographic distances were used to create Figure 1B,D. Network plots of subsamples (Figure 1G,H) were generated using Phyloseq (McMurdie and Holmes, 2013) to calculate the intersample Jaccard distance. As expected, the strongest relationships are observed between sample proximity controls where soils were collected approximately 10 meters from one another and processed independently, demonstrating that closely related samples do in fact group together in our analysis pipeline.
AD and KS amplicon reads were assigned to known biosynthetic gene clusters using the eSNaPD algorithm at an e-value cutoff of 10−45 (Reddy et al., 2014). At this threshold eSNaPD has been used to successfully assign-and-recover gene clusters that encode congeners of multiple natural product families using only the sequence from a single domain amplicon (Owen et al., 2013; Chang and Brady, 2014; Kang and Brady, 2014). NRPS/PKS clusters typically have multiple KS or AD domains. Hits to all domains in a cluster were aggregated in our analyses. Data for eSNaPD hits broken down by sample and molecule are included as Supplementary file 6.
AD and KS OTU tables were analyzed for the presence of eSNaPD hits. For each sample the abundance of each eSNaPD hit (i.e., a particular molecule) was calculated as either a percentage of total reads (Figure 2A,C) or as the total number of unique OTUs assigned to the molecule that were found in that sample (Figure 2B,C), or as the total number of OTUs mapped to a molecule in each sample. In the read-based hotspot analysis, the number of reads assigned by eSNaPD to a specific gene cluster is expressed as a fraction of total per sample reads: (reads-to-cluster-of-interest)/total sample reads). In the OTU-based hotspot analysis we calculated the number of unique eSNaPD assigned OTUs found in each sample that map to a specific gene cluster. The full eSNaPD dataset is available in Supplementary file 6. To compare global biosynthetic diversity of each sample, the AD and KS OTU tables were combined and for each sample they were subsampled ten times to a depth of 5000 reads. The Chao1 diversity metric was calculated for each sample and the average was used to compare the expected biodiversity in different samples at the same sampling depth (Figure 1C, Supplementary file 7).
Classes and methods for spatial data in R. R News9, 5.
Palmitoylputrescine, an antibiotic isolated from the heterologous expression of DNA extracted from bromeliad tank waterJournal of Natural Products 67:1283–1286.https://doi.org/10.1021/np0499766
Chemical-biogeographic survey of secondary metabolism in soilProceedings of the National Academy of Sciences of USA 111:3757–3762.https://doi.org/10.1073/pnas.1318021111
Natural products: a continuing source of novel drug leadsBiochimica et Biophysica Acta 1830:3670–3695.https://doi.org/10.1016/j.bbagen.2013.02.008
Medicinal natural products: a biosynthetic approachJohn Wiley & Sons.
Polyketide synthases and nonribosomal peptide synthetases: the emerging view from bacterial genomicsNatural Product Reports 24:1073–1109.https://doi.org/10.1039/b514050c
UPARSE: highly accurate OTU sequences from microbial amplicon readsNature Methods 10:996–998.https://doi.org/10.1038/nmeth.2604
Microbial metagenomics: beyond the genomeAnnual Review of Marine Science 3:347–371.https://doi.org/10.1146/annurev-marine-120709-142811
Tackling soil diversity with the assembly of large, complex metagenomesProceedings of the National Academy of Sciences of USA 111:4904–4909.https://doi.org/10.1073/pnas.1402564111
vegan: Community Ecology Package. http://CRAN.R-project.org/package=veganvegan: Community Ecology Package. http://CRAN.R-project.org/package=vegan.
The nineteenth century roots of 'everything is everywhere'Nature Reviews Microbiology 5:647–651.https://doi.org/10.1038/nrmicro1711
Mapping gene clusters within arrayed metagenomic libraries to expand the structural diversity of biomedically relevant natural productsProceedings of the National Academy of Sciences of USA 110:11797–11802.https://doi.org/10.1073/pnas.1222159110
Microbial phylogeny and diversity: small subunit ribosomal RNA sequence analysis and beyondMicrobiological Research 166:99–110.https://doi.org/10.1016/j.micres.2010.02.003
The uncultured microbial majorityAnnual Review of Microbiology 57:369–394.https://doi.org/10.1146/annurev.micro.57.030502.090759
Natural product biosynthetic gene diversity in geographically distinct soil microbiomesApplied and Environmental Microbiology 78:3744–3752.https://doi.org/10.1128/AEM.00102-12
Metagenomic analysis reveals diverse polyketide synthase gene clusters in microorganisms associated with the marine sponge Discodermia dissolutaApplied and Environmental Microbiology 71:4840–4849.https://doi.org/10.1128/AEM.71.8.4840-4849.2005
High diversity in DNA of soil bacteriaApplied and Environmental Microbiology 56:782–787.
Novel techniques for analysing microbial diversity in natural and perturbed environmentsJournal of Biotechnology 64:53–62.https://doi.org/10.1016/S0168-1656(98)00103-5
Jon ClardyReviewing Editor; Harvard Medical School, United States
eLife posts the editorial decision letter and author response on a selection of the published articles (subject to the approval of the authors). An edited version of the letter sent to the authors after peer review is shown, indicating the substantive concerns or comments; minor concerns are not usually shown. Reviewers have the opportunity to discuss the decision before the letter is sent (see review process). Similarly, the author response typically shows only responses to the major concerns raised by the reviewers.
Thank you for sending your work entitled “Global Biogeography of Bacterial Secondary Metabolism” for consideration at eLife. Your article has been favorably evaluated by Ian Baldwin (Senior editor) and 4 reviewers, one of whom, Jon Clardy, is a member of our Board of Reviewing Editors.
The Reviewing editor and the other reviewers discussed their comments before we reached this decision, and the Reviewing editor has assembled the following comments to help you prepare a revised submission.
The reviewers felt that the paper raised a significant point regarding the widespread understanding that “everything is everywhere, the environment selects”. However, in its present form, the tone of the article overstated the extent to which the current study discredited this understanding and suggested a number of modifications that would tone down the conclusions. These began with the title, and ”Global Biogeographical Sampling of Bacterial Secondary Metabolism“ was suggested as more consistent with the actual research presented. Most of the comments focused on: the small sample size, and the degree to which it broadly surveyed potential ecological sites; the restricted focus on NRPS and KS pathways, which in turn select for a subset of bacteria (with an explicit reference to the recent Fischbach paper in Cell); and the ability of sequence tag data (as opposed to deeper sequencing) to serve as a metric for natural product biosynthetic diversity in general.
The reviewers agree that even with these limitations the data presented offer a significant challenge to the reigning “everything is everywhere” paradigm, but they don't completely overthrow it.
A concern about the degree with which the “crowd sourcing” approach aligned with the Nagoya Protocol on Biological Diversity was also raised.https://doi.org/10.7554/eLife.05048.012
1a) The reviewers felt that the paper raised a significant point regarding the widespread understanding that “everything is everywhere, the environment selects”. However, in its present form, the tone of the article overstated the extent to which the current study discredited this understanding and suggested a number of modifications that would tone down the conclusions.
We agree that our data are insufficient to overturn the “everything is everywhere but the environment selects” hypothesis, and it was not our intention to specifically address that topic in this manuscript. On one hand, we see that gene populations are more similar from similar environments suggesting a selective effect. On the other hand, geographically distant sites from similar environments have relatively few OTUs in common. While the “everything/everywhere” theory is of immense interest in understanding the ecology and natural history of microbial communities, we have tried to limit ourselves to empirical descriptions of the populations that suggest meaningful differences among sample locations irrespective of how those differences came to be. From the perspective of functional metagenomics, the origin of these differences is less important than being able to describe and utilize them.
To address the relationship between our data and the Baas-Becking hypothesis directly, in the manuscript we have added the following sentence to the third paragraph of the Results and discussion section:
“Although it is possible that at a much greater sampling depth all AD and KS domains will be found at all sites as predicted by Baas-Becking’s ‘everything is everywhere but the environment selects’ hypothesis of global microbial distribution, our data suggests a strong dependence on geography and ecology in determining the major biosynthetic components of a microbiome.”
We have also made changes to the wording throughout the manuscript, which are intended to tone down the conclusions of the manuscript and make it clear that our statements are a reflection of the samples collected and are not necessarily trends across the globe.
1b) These began with the title, and ”Global Biogeographical Sampling of Bacterial Secondary Metabolism“ was suggested as more consistent with the actual research presented.
The title has been changed to “Global Biogeographic Sampling of Bacterial Secondary Metabolism” as suggested.
1c) Most of the comments focused on: the small sample size, and the degree to which it broadly surveyed potential ecological sites.
We entirely agree that a strong statement on the global abundance and distribution will require sampling the Earth’s surface at a much greater density than we were able to achieve in this study. Nonetheless, this is the largest survey to date targeting NRPS/PKS biosynthetic genes and we believe that there are a number of trends that emerge at this degree of sampling, most notably the large differences between microbiomes around the world with the intriguing corollary that there are large numbers of gene clusters we may not have investigated for their biosynthetic potential. As outlined above we have now clarified in a number of places in the manuscript that all conclusions drawn reflect the dataset we analyzed and not the Globe as a whole.
1d) The restricted focus on NRPS and KS pathways, which in turn select for a subset of bacteria (with an explicit reference to the recent Fischbach paper in Cell).
The NRPS and PKS biosynthetic families are a natural starting point for metagenome-driven drug discovery for two reasons: first, they are overrepresented as pharmacologically active agents, and, second, their conserved biosynthetic domains allow a PCR-based approach to detect amplicons belonging to clusters that encode a diverse set of compounds. The comprehensive overview of sequenced genomes that was recently published in Cell, “Insights into Secondary Metabolism from a Global Analysis of Prokaryotic Biosynthetic Gene Clusters,” shows that there are a handful of biosynthetic systems that are repeatedly found in the genomes of known bacteria that the authors characterize as follows: Saccharide, Other, NRPS, PKS/Fatty Acid, Hybrid (NRPS/PKS), Ribosomal Peptides, and Terpenes. These cluster families are the usual suspects of secondary metabolism. Although the paper highlights the existence of bacterial gene cluster families whose natural products are not known, the dominant players seen in this data analysis remain these well characterized gene cluster families. These same trends are seen in an extensive gene cluster analysis recently published by Bill Metcalf’s group in Nature Chemical Biology entitled “A roadmap for natural product discovery based on large-scale genomics and metabolomics”. The large set of saccharide-encoding gene clusters reported in one survey paper likely significantly overestimates the role of sugar metabolism in natural product biosynthesis by including saccharides with functions with a structural rather than signaling/defensive role (cellulose, O-antigens, Capsular polysaccharides, etc.). Therefore, we believe all recently reported gene cluster surveys essentially confirm the principle upon which our study is based, namely, NP diversity is dominated by a small group of conserved families and that examination of these families is likely to be the most informative avenue for studying chemical diversity in the environment. We have chosen NRPS/PKS as they are largest of these dominant biosynthetic systems although it will be a natural extension of this work for our lab and others to extend it to other families as well.
1e) The ability of sequence tag data (as opposed to deeper sequencing) to serve as a metric for natural product biosynthetic diversity in general.
We agree with the reviewers that long contigs with greater contextual information about the bacterial gene clusters would allow for a better analytical power but disagree that deeper sequencing would provide that information as assembling metagenomes remains a significant technical bottleneck in metagenome research. Currently, there is no easy way to assemble a heterogeneous population of soil-derived genomes using shotgun methods despite the many groups spending a lot of time and money trying to develop algorithms or technical solutions to the problem. Contemporary state-of-the-art metagenome assemblies from complex microbiomes result in incomplete assembly and short contigs. The scale of sequencing required for generating even very low quality information from complex microbiomes makes it impractical to employ this technique for screening large numbers of unique microbiomes. Because of the shortcomings of shotgun-methods in soil metagenomes, we have championed the use of sequence tags to bypass these technical problems. Unlike whole genome sequencing methodologies, this approach does not rely on the analysis of complete biosynthetic clusters. Instead, it uses individual next-generation sequencing reads from PCR amplicons of conserved biosynthetic domain gene sequences (termed Natural Product Sequence Tags, NPSTs) to predict gene content and chemical output of NP gene clusters, in a fashion analogous to reconstructing the phylogeny of entire organisms using 16S rRNA sequences. Although shotgun-sequencing approaches have been useful for guiding the identification of clusters in individual genomes and small, endosymbiont metagenomes, their application to more complex metagenomes has been very limited. In fact, in a direct comparison of biosynthetic domain detection from metagenomic samples, this PCR-based method was shown to be 10 to 100 times more sensitive than shotgun sequencing in identifying unique sequences.
2) The reviewers agree that even with these limitations the data presented offer a significant challenge to the reigning ”everything is everywhere“ paradigm, but they don't completely overthrow it.
Please see comment 1a.
3) A concern about the degree with which the ”crowd sourcing“ approach aligned with the Nagoya Protocol on Biological Diversity was also raised.
Although crowd sourcing was used to obtain samples for this study, these samples were almost exclusively from within the United States. Of the foreign (non-US) samples collected for this study, the vast majority of these samples were collected and processed by scientists from the country of origin. Additional non-United States samples represent eDNA samples originally collected in earlier international collaborative studies. In addition to our collection methods being of a collaborative nature with international scientists, our sample handling methods are designed to ensure that neither live bacteria nor cloneable eDNA is ever collected or available at the end of the experiment. It is therefore not possible to recover commercializable material (e.g., bacterial gene clusters, live bacteria or small molecules) from any foreign sample processed in this study. As no archiveable material was collected for this work, the gene fragments sequenced in this study are only useful for conducting comparative ecological studies like those outlined here, and they do not permit the recovery of any functional genetic material that might be of commercial value.https://doi.org/10.7554/eLife.05048.013
- Zachary Charlop-Powers
- Denise O Guimarães
- Denise O Guimarães
- Sean F Brady
- Sean F Brady
- Denise O Guimarães
- Monica T Pupo
- Monica T Pupo
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
This work was supported by National Institutes of Health grant number GM077516 (SFB), and F32 AI110029 (ZCP). SFB is a Howard Hughes Medical Institute Early Career Scientist. Brazilian research was supported by São Paulo Research Foundation (FAPESP) grant #2011/50869-8. MTP is a research fellow of the Conselho Nacional de Desenvolvimento Cientifico e Tecnologico (CNPq). DOG was supported by the Rio de Janeiro Research Foundation (FAPERJ) grant #E-26/110.281/2012 and CNPq grant #477509/2013-4. The authors would also like to acknowledge the following people and institutions for assistance in collecting samples: Rafael Bonante, Samyr Soares Viana, Vitor de Carli, Ronaldo de Carli, Erin Bishop, Vanessa Kowalski, The Instituto National de Bioversidad, Serengeti Genesis, and the University of Dar Es Salaam.
- Jon Clardy, Harvard Medical School, United States
© 2015, Charlop-Powers et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.