1. Computational and Systems Biology
Download icon

Metabolic network percolation quantifies biosynthetic capabilities across the human oral microbiome

  1. David B Bernstein
  2. Floyd E Dewhirst
  3. Daniel Segrè  Is a corresponding author
  1. Boston University, United States
  2. The Forsyth Institute, United States
  3. Harvard School of Dental Medicine, United States
Research Article
  • Cited 0
  • Views 1,416
  • Annotations
Cite this article as: eLife 2019;8:e39733 doi: 10.7554/eLife.39733

Abstract

The biosynthetic capabilities of microbes underlie their growth and interactions, playing a prominent role in microbial community structure. For large, diverse microbial communities, prediction of these capabilities is limited by uncertainty about metabolic functions and environmental conditions. To address this challenge, we propose a probabilistic method, inspired by percolation theory, to computationally quantify how robustly a genome-derived metabolic network produces a given set of metabolites under an ensemble of variable environments. We used this method to compile an atlas of predicted biosynthetic capabilities for 97 metabolites across 456 human oral microbes. This atlas captures taxonomically-related trends in biomass composition, and makes it possible to estimate inter-microbial metabolic distances that correlate with microbial co-occurrences. We also found a distinct cluster of fastidious/uncultivated taxa, including several Saccharibacteria (TM7) species, characterized by their abundant metabolic deficiencies. By embracing uncertainty, our approach can be broadly applied to understanding metabolic interactions in complex microbial ecosystems.

https://doi.org/10.7554/eLife.39733.001

Introduction

Metabolism, in addition to enabling growth and homeostasis for individual microbes, contributes to the organization of complex, dynamic microbial communities. Within these communities, different microbes have diverse metabolic capabilities that lead to interactions driving microbial community structure and dynamics at multiple spatial and temporal scales (Ponomarova and Patil, 2015; Phelan et al., 2012; Watrous et al., 2013; Harcombe et al., 2014; Embree et al., 2015). For example, through cross-feeding, a compound produced by one species might benefit another, leading to a network of metabolic interdependences (Embree et al., 2015; Goldford et al., 2017; Mee et al., 2014; Pande et al., 2015; D'Souza et al., 2018; Zengler and Zaramela, 2018; Pacheco et al., 2019; Mee and Wang, 2012). This type of interaction has been proposed as one of the main reasons for the prevalence, in natural microbial communities, of uncultivated (or fastidious) microbes (Stewart, 2012; Epstein, 2013; Pande and Kost, 2017; Staley and Konopka, 1985). These microbes do not grow in pure culture on standard laboratory conditions as they may depend on diffusible metabolites produced by neighboring microbes (Pande and Kost, 2017). The prominence of uncultivated/fastidious microbial organisms across the tree of life and their potential importance in microbial community structure is highlighted by the recent identification of the candidate phyla radiation – a large branch of the tree of life consisting mainly of uncultivated organisms with small genomes and unique metabolic properties (Kantor et al., 2013; Brown et al., 2015; Hug et al., 2016). Efforts towards understanding this important component of microbial communities require further knowledge of metabolic interdependencies driven by biosynthetic deficiencies.

Some of the most promising strides in understanding metabolic interdependences between microbes have been taken in the study of the human oral microbiome. The human oral microbiome serves as an excellent model system for microbial communities research, due to its importance for human health and ease of access for researchers (Dewhirst et al., 2010; Wade, 2013). For example, the order of colonization of species in dental plaque has been characterized physically (Kolenbrander et al., 2010) and metabolically (Mazumdar et al., 2013), and visualized microscopically (Mark Welch et al., 2016). The human oral microbiome consists of roughly 700 different microbial species, identified by 16S rRNA microbiome sequencing and cataloged in the human oral microbiome database (Dewhirst et al., 2010; Chen et al., 2010). Importantly, 63% of species in the human oral microbiome have been sequenced, including several uncultivated and recently-cultivated strains implicated in oral health and disease (Krishnan et al., 2017; Siqueira Jr and Rôças, 2013). Exciting recent work has led to successful laboratory co-cultivation of at least three previously uncultivated organisms, the Saccharibacteria (TM7) phylum taxa: Saccharibacteria bacterium HMT-952 strain TM7x (Bedree et al., 2018He et al., 2015; Bor et al., 2016; Bor et al., 2018), Saccharibacteria bacterium HMT-488 strain AC001 (Collins et al., 2019a), and Saccharibacteria bacterium HMT-955 strain PM004 (Collins et al., 2019b). Saccharibacteria are prominent in the oral cavity and relevant for periodontal disease (Brinig et al., 2003; Ouverney et al., 2003). Due to their importance, they were among the first uncultivated organisms from the oral microbiome to be fully sequenced via single-cell sequencing methods (Marcy et al., 2007), and represent the first co-cultivated members of the candidate phyla radiation (He et al., 2015). Thus, their metabolic and phenotypic properties are of great interest for oral health and microbiology in general.

In parallel to achieving laboratory growth of diverse and uncultivated bacteria, a major unresolved challenge is understanding the detailed metabolic mechanisms that may underlie their dependencies. Ideally, one would want to computationally predict, directly from the genome of an organism, its biosynthetic capabilities and deficiencies, so as to translate sequence information into mechanisms and community-level phenotypes (Widder et al., 2016). A number of approaches, based on computational analyses of metabolic networks, have contributed significant progress towards this goal (Schuster et al., 2000; Oberhardt et al., 2009; Lewis et al., 2012). At the heart of these methods are metabolic network reconstructions, formal encodings of the stoichiometry of all metabolic reactions in an organism, that are readily amenable to multiple types of in silico analyses and simulations (Feist et al., 2009). Recent exciting progress has led to the automated generation of ‘draft’ metabolic network reconstructions for any organism with a sequenced genome (Henry et al., 2010), opening the door for the quantitative study of large and diverse microbial communities. The most commonly used metabolic network analysis methods – flux balance analysis (FBA) (Orth et al., 2010a) and its dynamic version (dFBA) (Mahadevan et al., 2002) – have been extensively applied to study microbial communities (Harcombe et al., 2014; Embree et al., 2015; Pacheco et al., 2019; Magnúsdóttir et al., 2017; Magnúsdóttir and Thiele, 2018; Zarecki et al., 2014; Stolyar et al., 2007; Klitgord and Segrè, 2010; Freilich et al., 2011; Zelezniak et al., 2015; Cook and Nielsen, 2017; Biggs et al., 2015; Zomorrodi and Segrè, 2016). However, FBA and dFBA are not easily applicable to automatically-generated draft metabolic networks due to gaps (missing or incorrect reactions) in the metabolic network, and are thus difficult to scale to large and diverse microbial communities. Methods for ‘gap-filling’ draft reconstructions can address this problem, and ensemble methods potentially present a promising approach (Biggs and Papin, 2017; Machado et al., 2018). However, any gap-filling comes at the expense of an increased risk for false positive predictions. Additionally, gap-filling typically requires specific knowledge or assumptions on the growth media composition – which are often difficult to obtain for diverse environmental isolates and by definition unknown for uncultivated organisms. Alternatively, topology-based metabolic network analysis methods, such as network expansion (Ebenhöh et al., 2004) and NetSeed-based methods (Borenstein et al., 2008), are less dependent on gap-filling and have been applied to the analysis of draft metabolic reconstructions. These methods have provided valuable large-scale insight into metabolic processes in microbial communities, including the biosynthetic potentials of organisms and metabolites (Basler et al., 2008; Matthäus et al., 2008), the chance of cooperation or competition between species (Carr and Borenstein, 2012; Kreimer et al., 2012; Levy et al., 2015; Opatovsky et al., 2018), and the relationship between organisms and environment (Borenstein et al., 2008; Freilich et al., 2009; Handorf et al., 2008), for example in the human gut microbiome (Levy and Borenstein, 2013). While all of these approaches are promising, an additional issue that continues to limit the use of metabolic network analysis for prediction of biosynthetic capabilities is the difficulty of generating these predictions when the chemical environment of the microbes is unknown. In complex microbial communities, such as the human microbiome, the exact chemical composition of the environment is difficult to estimate, due both to the molecular complexity of the environment itself, and to the likely prevalence of secretions, lysing and cross-feeding within the community. Thus, the capacity to provide metabolic predictions based on unelaborated genome annotation, and on limited knowledge about an organism’s growth environment remains an important open challenge.

Here we introduce a new method, which begins to address the above limitations, and provides a novel prediction of an organism’s biosynthetic capabilities. Our method applies a probabilistic approach to define and compute a metric that estimates which metabolites, such as biomass components, are robustly synthesized by a given metabolic network and which would likely need to be supplied from the environment/community. Discrepancies in these calculated estimates between organisms can be used to generate hypotheses regarding microbial auxotrophy and metabolic exchange in microbial communities. Importantly, our metric has the capacity to estimate biosynthetic capabilities in spite of uncertainty about environmental conditions by randomly sampling many different possible nutrient combinations. In this study, we first demonstrated our method on E. coli to clarify its performance and interpretation. Next, we applied our method to a large number of organisms from the human oral microbiome, and predicted broad trends in biosynthetic capabilities associated with taxonomy and microbial co-occurrence. We further focused our analysis on uncultivated microorganisms, including three recently co-cultivated Saccharibacteria (TM7) strains. In addition to highlighting their biosynthetic deficiencies, we developed specific hypotheses for their metabolic exchange with growth-supporting partner microbes.

Analysis method

Our newly developed method quantifies the robustness with which a given metabolic network can produce a given metabolite from variable metabolic precursors. In essence, we quantify a metabolic network specific metric for metabolite producibility by probabilistically sampling sets of possible environments. While the probabilistic sampling can be adjusted to reflect a specific environment, its power lies largely in the capacity to explicitly incorporate in a statistical way the lack of knowledge about environmental composition.

The inspiration for this method comes from the statistical physics concept of percolation. Percolation theory has been applied in a wide range of fields, including the study of cascading metabolic failure upon gene deletions in metabolism (Smart et al., 2008; Barabási, 2015). In percolation theory the robustness of a network can be characterized by randomly adding or removing components (nodes or edges) of a network and assessing network connectivity (Barabási, 2015). The smaller the number of components that need to be randomly added to the network before it becomes connected, the more robust it is to perturbations. We utilized this concept to characterize the network robustness of a particular metabolic network towards producing a specified target metabolite by randomly adding input metabolites to the network and assessing the network’s ability to produce the target.

To implement our method, we first introduced a probabilistic framework for analyzing metabolic networks (Figure 1 and Figure 1—figure supplement 1). In this framework, every metabolite can be considered to be drawn from a Bernoulli distribution, i.e. present in the network with a given input probability (Pin). These probabilities could represent beliefs about the environment, chances of metabolites being available from a host organism, or any arbitrary prior assumption on metabolite inputs. Throughout the majority of our analyses we have assigned Pin to be an identical value for all input metabolites. However, as illustrated in an example in our results section (Metabolite producibility in a protein vs. carbohydrate-enriched environment) this probabilistic framework can utilize Pin values that vary across metabolites. Following the assignment of Pin, the network structure is used to calculate the output probability (Pout) of some specified target metabolite. In practice, random sampling of probabilistically drawn input metabolite sets is used to calculate the probability of producing the target metabolite. For each random sample, a modified version of FBA (Orth et al., 2010a) is used to assess the network's ability to produce the target metabolite (for a complete explanation of how FBA is implemented in this context, see methods section: Algorithm functions, feas).

Figure 1 with 2 supplements see all
A probabilistic framework for calculating the producibility metric (PM).

(A) Random samples of input metabolites are added to the metabolic network with probability Pin. Samples are shown here with gray or red circles. Sampled input metabolites are then used to calculate if a specified target output metabolite can be produced or not. Here the solid red circled sample leads to production of the target metabolite while the dotted gray circled samples do not. The probability of producing the target output metabolite (Pout) is calculated by taking many random samples at a specified Pin. (B) A producibility curve is calculated which represents Pout as function of Pin. Points along this curve are sampled by assigning the Pin value and estimating Pout. The Pin value at which Pout = 0.5 (Pin,0.5) is used to define the producibility metric (PM) as PM = 1-Pin,0.5.

https://doi.org/10.7554/eLife.39733.002

Using the above probabilistic framework, we defined a novel metric quantifying biosynthetic capabilities, the producibility metric (PM) (Figure 1B). The PM is calculated as follows: First, a producibility curve describing Pout as a function of Pin is generated for a given metabolic network and metabolite target. This curve can be estimated by sampling input metabolites for different values of Pin (between 0 and 1), and calculating Pout. Next, we calculated the Pin value along the producibility curve at which Pout is equal to 0.5 (Pin,0.5, analogous to the Km in the Michaelis-Menten curve). Finally, PM is defined as PM = 1-Pin,0.5, such that larger PM values correspond to increased robustness. Our method calculates PM efficiently by random sampling and a nonlinear fitting algorithm (for details, see methods section: Algorithm functions calc_PM_fit_nonlin). In addition to calculating PM computationally for arbitrary metabolic networks and metabolites, we also derived a way to calculate PM analytically using combinatorial equations. The combinatorial equations are built up from simple scenarios to the most general in Figure 1—figure supplement 2. This analytical result, verified in detail for one specific pathway (Figure 2—figure supplement 2) clarifies the connection between our metric and the concept of minimal precursor sets (Andrade et al., 2016). It describes mathematically how the PM captures the multiplicity of routes through which a given target metabolite can be produced, and could serve as the basis for further theoretical work on the fundamental properties of metabolic networks.

The algorithms used to implement our method are written in MATLAB and designed as a set of modular functions that interface with the COBRA toolbox – a popular metabolic modeling software compendium (Schellenberger et al., 2011; Heirendt et al., 2019). The methodology behind each function is further explained in the methods section. The code is freely available online at https://github.com/segrelab/biosynthetic_network_robustness (Bernstein, 2019; copy archived at https://github.com/elifesciences-publications/biosynthetic_network_robustness).

Results

Using the E. coli core metabolic network to demonstrate features of metabolite producibility

Before applying our approach to the systematic study of genome-scale metabolic networks from the human oral microbiome, we used the model organism E. coli to illustrate the performance and interpretation of our method. We started with the E. coli core metabolic network, a simplified network consisting of central carbon metabolism and lacking peripheral metabolic pathways, such as amino acid or cofactor biosynthesis (Orth et al., 2010b). We calculated the PM for all intracellular metabolites in this network using a uniform ensemble of environments (as described in the methods). The results are shown in Figure 2A, overlaid on the E. coli core metabolic network itself, with each node’s color indicating its PM value and node size indicating its degree of connectivity. Consistent with the high connectivity of the E. coli core metabolic network, most metabolites have high PM values (PM >0.950). For example, the metabolites H+ and pyruvate are both highly connected in the metabolic network and have high PM (PM = 0.968 and 0.952 respectively). However, the network also contains several metabolites that are well connected, but have lower PM values. These include, for example, the cofactors AMP/ADP/ATP and NAD+/NADH, which have PM values of ~0.7 and ~0.5 respectively, because they can be produced from each other, but not biosynthesized in this network. The network also includes several examples of metabolites that are poorly connected but have high PM values. One example is D-lactate, which is produced only via Lactate Dehydrogenase from the high PM metabolites Pyruvate and H+ (Figure 2B). This reaction also consumes NADH and produces NAD+ but because these cofactors can be easily recycled from each other by a large number of different reactions, their relatively low PM (as described above) has minimal influence on the PM value of D-lactate (Figure 2B). This example demonstrates the fact that our metric captures metabolites which are easily produced because their precursors are easily produced, and that the PM of recycled cofactors has minimal influence on the PM of a target metabolite. Overall, there is also no significant correlation between the PM values and the node degree of a metabolite in the network (Figure 2—figure supplement 1), indicating that our metric describes a more complex property of a metabolite in a network that is not captured simply by node degree.

Figure 2 with 2 supplements see all
E. coli core metabolic network metabolite producibility.

(A) The E. coli core metabolic network is represented as a bipartite graph with metabolites shown as circles and reactions shown as squares. Reactions shown with a black border are irreversible in the model, those with no border are reversible. All intracellular metabolites are colored based on their PM value (low – blue, high – red). Reactions and metabolite nodes are sized based on their total node degree. Several key metabolites of interest are highlighted with their corresponding PM values shown. Central metabolites such as H+ and Pyruvate have high degree and high PM. Cofactors such as AMP/ADP/ATP and NAD+/NADH have high degree but low PM, as they cannot be synthesized in this network. Oxygen is an example of a PM=0 metabolite that cannot be produced from any other metabolites in this network. D-lactate is an example of a metabolite with low degree and high PM that is it is easily produced but not well-connected. (B) The lactate dehydrogenase reaction producing D-Lactate is shown as an example to illustrate that poorly connected metabolites can display a high PM, and how recycled cofactors have minimal impact on PM values. Lactate dehydrogenase produces D-lactate and NAD+ from pyruvate, H+ and NADH. The metabolite D-lactate has high PM despite being produced only by this one reaction in the metabolic network because it can be produced from the high PM metabolites pyruvate and H+, which are themselves produced from a large number of possible precursors. Although NADH is also used to produce D-lactate, and has a relatively low PM in this core model, it has minimal impact on the PM of D-lactate as NADH can be recycled from NAD+ by a large number of reactions (represented by the arrows at the bottom of the figure) and thus production of NADH is not necessary for the production of D-lactate.

https://doi.org/10.7554/eLife.39733.005

Producibility of metabolites differs from pathway completeness and captures minimal precursor set structure

We next applied our method in detail to a specific biosynthetic pathway within a genome-scale model to demonstrate how our PM provides information that is richer than what can be learned from simply counting the percent of reactions present in a given biosynthetic pathway. Specifically, we analyzed the histidine biosynthetic pathway in the curated E. coli iJO1366 genome-scale metabolic network (Orth et al., 2011), and checked how the two methods differ in their capacity to capture the effect of reaction knock-outs along the pathway (Figure 2—figure supplement 2). The PM is more sensitive than pathway completeness, as it captures features beyond the percent of reactions in the biosynthetic pathway. For different knockouts in the histidine biosynthetic pathway (which counts nine distinct reactions) the PM is related to the distance of the removed reaction from the target metabolite (histidine), whereas the completeness score would be the same (8 out of 9) for each auxotroph (Figure 2—figure supplement 2B). This same capacity of PM to capture finer details of the effect of missing reactions in a pathway is also confirmed by a similar analysis of histidine biosynthesis across all oral microbiome draft metabolic networks (oral microbiome network reconstruction and analysis described later in the manuscript) (Figure 2—figure supplement 2C).

In general, in contrast with the percent completion of the biosynthetic pathway, the PM depends deeply on the pathway structure, ultimately capturing the number of different routes through which the target metabolite can be synthesized (also called the minimal precursor sets; Andrade et al., 2016). This property stems intuitively from the way the PM is defined, and is explained precisely by our combinatorial theory (Figure 1—figure supplement 2). While our method’s computational estimate of the PM is based on sampling the space of possible precursor sets, the combinatorial theory provides an exact value for the producibility of a molecule with a given minimal precursor set structure. The close match between the sample-based PM and the combinatorial theory for the histidine biosynthetic pathway (Figure 2—figure supplement 2B) suggests that the PM indeed captures the complex multiplicity of avenues for producing a given metabolite.

Producibility analysis shows improved tolerance to missing reactions compared to flux balance analysis

One of the challenges we wished to address with our method is the possibility of making robust inferences about the metabolic capabilities of different organisms in spite of missing reactions – a situation often encountered upon reconstructing metabolic networks from newly sequenced genomes. To assess the performance of our approach in this context, we compared it with flux balance analysis (FBA) calculations for a genome-scale metabolic networks with a given number of randomly removed reactions. In particular, we applied both FBA and our method to the E. coli iJO1366 genome-scale metabolic network, which we gradually perturbed by removing an increasing number of randomly chosen reactions. In this performance test, the unperturbed iJO1366 metabolic network was used as a ground truth against which the predictions of our method and FBA on perturbed metabolic networks were compared. Figure 3 shows the accuracy of both FBA and the PM as a function of the percentage of reactions removed from the iJO1366 metabolic network. While the output of our method (the PM for any metabolite) is different from that of FBA (the flux through all reactions), one can use the PM values observed across all biomass components as a proxy for the growth capacity of an organism, providing a metric that is comparable with the FBA-predicted biomass production flux. The specific metrics used to compare the PM and FBA predictions for biomass production are described further in the Figure 3 legend. One can see that both the FBA and the PM predictions become worse as the metabolic networks are further perturbed. However, the PM predictions are more tolerant to missing reactions than the FBA predictions. While the FBA production of biomass becomes infeasible for the majority of the perturbed metabolic networks after removing less than 1% of the reactions, the PM results remain fairly quantitatively accurate when removing up to 10% of the reactions. This analysis provides insight into the accuracy of our method for analyzing metabolic networks with gaps, such as draft (non-gap-filled) metabolic networks produced through automated reconstruction pipelines.

Figure 3 with 1 supplement see all
The accuracy of flux balance analysis and the producibility metric for different perturbed E. coli genome-scale metabolic networks.

Reactions were randomly removed from the E. coli iJO1366 metabolic network generating 50 different networks at five different levels of reaction removal. These networks were then analyzed with the producibility metric (PM) and flux balance analysis (FBA) in a minimal and complete medium. The accuracy of the PM and FBA results were assessed through two different measures and plotted as a function of the number of reactions removed on a semi-log plot. (A) Quantitative difference accuracy – The accuracy was measured quantitatively based on the L1 norm of the difference between the original network metric and the randomly perturbed network metric. For FBA the L1 norm was computed as the absolute value of the difference between the biomass flux of the original network and the perturbed network. For the PM the L1 norm was calculated as the sum of the absolute value of the difference between each PM value. The L1 norm for both metrics was then normalized and subtracted from one to give a measure of accuracy. The mean of 50 different randomly perturbed networks at five different reaction removal levels is shown with dots connected by solid lines (FBA on minimal medium: Blue, FBA on complete medium: Red, PM: Purple). The standard error of the metric is shown as a shaded region around the line. (B) Biomass production accuracy – The accuracy was measured by the fraction of randomly perturbed metabolic networks that were capable of producing biomass. For FBA this was calculated as the fraction of networks capable of producing biomass flux above 1% of the unperturbed biomass flux (FBA on minimal medium: Blue, FBA on complete medium: Red). For the PM, the biomass production accuracy was calculated as the fraction of networks capable of producing all biomass components above a specified PM threshold. The threshold was either PM >0.1 (solid purple) or PM >0.6 (dashed purple).

https://doi.org/10.7554/eLife.39733.008

Metabolite producibility points to putative metabolic mechanisms for E. coli auxotroph co-cultures

As a first test of our approach in its capacity to provide metabolic insight about experimental measurements of inter-microbial interactions, we used the PM to estimate the capacity of different E. coli auxotrophs to compensate for each other’s metabolic limitations. In particular, we compared experimental data from co-cultures of E. coli auxotrophs from Wintermute and Silver (2010) with corresponding PM calculations. After reconstructing in silico the specific auxotroph strains used in this work (based on the E. coli iJO1366 metabolic network), we calculated the PM for all essential biomass components in each auxotroph and compared the PM values to the experimentally measured synergistic growth of auxotroph pairs (Figure 3—figure supplement 1). Different auxotrophs, clustered by PM, were seen to group based on the pathway of the removed gene, and auxotrophs with knockouts in different locations of the same biosynthetic pathway showed a graded decrease in PM for the corresponding biomass component, similar to what was seen in our histidine biosynthetic pathway analysis in Figure 2—figure supplement 2. The overall distance between auxotroph PM values was positively correlated with synergistic growth, suggesting that auxotrophs with different biosynthetic capabilities could better support each other’s growth (Figure 3—figure supplement 1A). Several examples and counter-examples that further elaborate this trend are highlighted in Figure 3—figure supplement 1B and C. This analysis also gave us the opportunity to query in more depth the capacity of our approach to provide insight into whether auxotrophs with higher PM values may be more easily supplemented by partner auxotrophs. We did not detect a clear general signal on whether auxotrophs could rescue each other based on the average PM across all biomass components. However, for a specific instance, namely auxotrophs for tryptophan, we found a correlation between tryptophan PM and average synergistic growth with other auxotrophs (Figure 3—figure supplement 1D), possibly suggesting that the PM captures the ease with which auxotrophs in this pathway can be supplemented by other auxotrophs. Overall, our method enables the comparison of model-based producibility predictions with experimental data on auxotrophic interdependencies. These predictions helped identify metabolic complementarity patterns, but did not fully capture all of the complexity of interactions between E. coli auxotrophs.

Reconstruction of human oral microbiome metabolic networks

We next applied our method to the human oral microbiome, aiming at a mechanistic characterization of the biochemical capabilities of different microbes in this community based on metabolic networks reconstructed directly from their genomes. As a first step, we reconstructed metabolic networks for 456 different microbial strains representing a diverse set of human oral microbes whose annotated genomes were available from the Human Oral Microbiome Database. These organisms represent 371 different species, 124 genera, 64 families, 35 orders, 22 classes, and 12 phyla. Metadata related to the selected organisms can be found in Supplementary file 4. Notably, the database includes several sequenced yet uncultivated or recently co-cultivated organisms. In particular, the following sequenced yet uncultivated, or recently co-cultivated, strains were included in our analysis: Saccharibacteria (TM7) bacterium HMT-952 strain TM7x (He et al., 2015), Saccharibacteria (TM7) bacterium HMT-955 strain PM004 (Collins et al., 2019b), Saccharibacteria (TM7) bacterium HMT-488 strain AC001 (Collins et al., 2019a), Tannerella HMT-286 strain W11667 (Vartoukian et al., 2016a), Anaerolineae (Chloroflexi phylum) bacterium HMT-439 strain Chl2 (Vartoukian et al., 2016b), Absconditabacteria (SR1) bacterium HMT-874 strain MGEHA (Campbell et al., 2013a), and Desulfobulbus HMT-041 strains Dsb2 and Dsb3 (Campbell et al., 2013b). All of the selected genomes were used to reconstruct sequence-specific draft metabolic networks using the Department of Energy Systems Biology Knowledgebase (KBase) ‘build metabolic model’ app (Henry et al., 2010; Arkin et al., 2018; Overbeek et al., 2014). The networks were reconstructed without any gap-filling. A KBase narrative containing the genomes and draft metabolic network reconstructions can be found at: https://narrative.kbase.us/narrative/ws.27853.obj.935. The complete collection of all network models is also available for download in MATLAB (.mat) format at https://github.com/segrelab/biosynthetic_network_robustness (Bernstein, 2019).

Large-scale patterns in biosynthetic capabilities identified across the human oral microbiome

We analyzed the PM for 88 different biomass metabolites across the aforementioned 456 metabolic networks from the human oral microbiome. The 88 biomass metabolites included all biomass building blocks considered to be essential for either Gram-negative or Gram-positive biomass, as listed in the KBase build metabolic models app (Henry et al., 2010; Arkin et al., 2018; Overbeek et al., 2014) (listed in Supplementary file 5). Through this analysis we calculated 40,128 PM values which represent an atlas of predicted biosynthetic capabilities across these human oral microbiome organisms. The ensuing atlas is represented as a hierarchically clustered matrix of PM values for all 456 organisms and 88 metabolites in Figure 4. The same data are available in Figure 4—figure supplement 1 (clustered by taxonomy), and in Supplementary file 6.

Figure 4 with 7 supplements see all
Human oral microbiome organisms PM matrix.

The producibility metric (PM) was calculated for 456 different oral microbiome organisms (columns) and 88 different essential biomass metabolites (rows). The resulting matrix is hierarchically clustered based on average distances between organisms and metabolites PM values. Organism Gram-stain and phylum/class are indicated by several annotation columns at the top of the matrix. The biomass metabolites analyzed consisted of several different types of metabolites indicated with different colors. Several metabolites that showed interesting patterns across oral microbiome organisms are highlighted with roman numerals. The most distinct cluster of organisms, highlighted and annotated (top left), consisted of fastidious reduced-genome organisms (Mycoplasma, Treponema) and uncultivated or recently cultivated organisms (SR1, TM7, Desulfobulbus, Anaerolineae).

https://doi.org/10.7554/eLife.39733.010

The hierarchically clustered heat map (Figure 4) shows extensive variability in the PM values of different organisms and metabolites across the oral microbiome. There are three main large clusters of metabolites: one cluster with consistently high PM (top), one cluster with low PM (middle), and one cluster with variable PM (bottom). Different classes of metabolites cluster quite differently across this landscape. In addition to simple ubiquitous metabolites, such as H2O or glycine (Figure 4 I), all nucleotides have high PM across the oral microbiome organisms. Amino acids generally have high PM as well, with the notable exception of tryptophan (Figure 4 II). Interestingly, tryptophan is known to be a particularly difficult amino acid to synthesize (Akashi and Gojobori, 2002). Metal ions generally had PM value of 0 across all organisms, serving as an expected negative control. Some exceptions, such as Mg2+, Co2+, Cl-, Fe3+, and Fe2+, can be explained based on their presence in larger compounds, such as porphyrins. For example, Co2+ has increased PM values in a pattern that closely follows the PM values of the cobalt containing vitamin cobamamide (Figure 4 III).

Before analyzing in detail the patterns identifiable in the PM matrix of Figure 4, we showed that such patterns could not be simply attributed to the broad property of genome size – even if genome size is known to be an important predictor of the overall biosynthetic capabilities of an organism (Zarecki et al., 2014). Fastidious or parasitic organisms tend to have reduced genomes and consequently reduced metabolic capabilities. In our data, the overall average PM value for each organism can be partially predicted by genome size. A linear regression model and quadratic regression model which used the log of genome size to predict the average PM value across all metabolites for each organism had R-squared values of 0.498 and 0.551 respectively (Figure 4—figure supplement 2 A). The fit of this model was further improved by adding taxonomic information as additional parameters (see methods section for additional details on adding taxonomic information). We inferred this by using the Akaike information criterion (AIC) and Bayesian information criterion (BIC), two measures of model accuracy that include a penalty for added parameters to discourage over-fitting (Clarke et al., 2009). The BIC has a stronger penalty for additional parameters and improved up to the order level, while the AIC improved up to the genus level (Figure 4—figure supplement 2 B, C). These improvements in AIC and BIC indicate that our data contain additional structure that is described by taxonomy beyond simply genome size.

Taxonomic trends capture biosynthetic patterns across human oral microbiome organisms

Many of the patterns in our large-scale analysis of the human oral microbiome PM matrix indicated taxonomic trends in the PM of different metabolites across organisms. While the clustering of the PM matrix was not entirely driven by taxonomy (Figure 4), we did see significant taxonomic trends in our data beyond what was explained simply be genome size (Figure 4—figure supplement 2). We further investigated, quantitatively, which specific phyla and orders were associated with specific PM trends by calculating the log likelihood ratio between a quadratic regression model predicting the PM values for a particular metabolite-based solely on genome size against one that incorporates a specific taxonomic parameter of interest (Figure 4—figure supplement 3). This allowed us to highlight metabolites with highly significant increased or decreased PM values in certain taxonomic groups, and to confirm patterns that we observed by eye in Figure 4. Numerous patterns and details of the PM matrix could be relevant for addressing specific biological questions or model refinement challenges. Here we focus in detail on two specific classes of compounds: (i) cell-wall and membrane components, which tend to vary broadly across organisms, and are important for antimicrobial susceptibility and immune system recognition; and (ii) amino acids and essential factors (e.g. vitamins), which could be relevant for understanding metabolic exchange among bacteria and with the host.

A first striking pattern in the PM matrix is the complexity of cell-wall and membrane components of different taxa. Some aspects of this pattern are consistent with standard attribution of metabolites associated with the Gram staining categories (estimated using the KBase build metabolic model app [Henry et al., 2010; Arkin et al., 2018; Overbeek et al., 2014]). However, we also observed interesting deviations, which could be partially attributed to known finer resolution in the specific membrane components across taxa. Compared to other metabolites, cell-wall components generally tend to have variable or low PM values across the oral microbiome organisms. We analyzed in detail fifteen different teichoic acids, a class of metabolites expected to be found in the cell wall of Gram-positive organisms that play an important role in microbial physiology and interactions with the host (Weidenmaier and Peschel, 2008). Of these, nine were found to have higher PM values in Gram-positive organisms, as expected (Figure 4 IV). In particular, the D-alanine substituted lipoteichoic acids had high PM values in the phylum Firmicutes and specifically the class Bacilli. However, there was another set of 6 teichoic acids that had intermediate PM values across a large number of organisms and didn’t follow Gram-staining trends (Figure 4 V). These consisted of three N-acetyl-D-glucosamine linked and three unsubstituted teichoic acids. This mismatch in expected patterns suggests that the metabolic pathways involving these particular cell-wall components may merit closer inspection in the network reconstruction process.

We further observed clear trends associated with several lipids which are expected to be found in the cell membrane of both Gram-positive and Gram-negative organisms. In particular, we found a strong increase in the PM value for three phosphatidylethanolamine lipids in Gram-negative organisms (Figure 4 VI). Interestingly, these lipids have been previously observed to be more commonly produced in Gram-negative organisms, and have implications for antimicrobial susceptibility (Epand et al., 2007; Epand and Epand, 2009). We also identified trends associated with three cardiolipin and three phosphatidylglycerol lipids that display generally similar PM patterns across different species (Figure 4 VII). One class of organisms that stands out with respect to lipid biosynthesis are the Negativicutes. These organisms have relatively high PM values for phosphatidylethanolamine but PM values of 0 for cardiolipin and phosphatidylglycerol lipids. Consistent with this result, it has been previously observed that the Negativicutes organism Selenomonas ruminantium lacks cardiolipin and phosphatidylglycerol lipids in its inner and outer cell membranes, but does have phosphatidylethanolamine (Kamio and Takahashi, 1980). It has been hypothesized that the membrane stabilizing role of these two missing lipids could be partially fulfilled by peptidoglycan bound polyamines, including spermidine, in Selenomonadales organisms (Kamio and Takahashi, 1980; Hamana et al., 2012). Concordantly, we see an increased PM value for the polyamine spermidine across Negativicutes in our data (Figure 4 VIII). These patterns suggest that the PM could be used to obtain organism-specific estimates of biomass composition from genomes for metabolic network reconstruction, facilitating assignments beyond gram positive/negative compositions.

Aside from lipids and cell-wall components, there are a number of interesting trends related to several amino acids and other essential factors in our data. A number of metabolites had increased PM in the phylum Proteobacteria and decreased PM values in the phylum Bacteroidetes. A notable example is heme, which can be seen to follow this trend (Figure 4 IX). Heme plays an important role in microbe host interactions, as bacterial pathogens often acquire it from their human host (Choby and Skaar, 2016). In the context of the human oral microbiome, the oral pathogen Porphyromonas gingivalis (belonging to the phylum Bacteroidetes) is known to scavenge heme (Olczak et al., 2005), compatible with the above pattern. Other metabolites that displayed the same trend include: arginine, cysteine, methionine, tryptophan, and glutathione. Arginine can be catabolized via the arginine deiminase pathway to regenerate ATP and is thus an interesting exchange metabolite beyond its use as a protein building block (Plugge and Stams, 2001; Schink, 2006). Tryptophan is one of the highest cost amino acids to biosynthesize (Akashi and Gojobori, 2002), and thus is an intriguing exchange candidate. Methionine and Cysteine are the only two sulfur containing standard amino acids, and glutathione is synthesized from Cysteine. It is possible that the discrepancies between PM values observed here are indicative of broad amino acid and vitamin exchange between the phyla Proteobacteria and Bacteroidetes in the human oral microbiome.

Organic acid production predicted for human oral microbiome organisms

In addition to calculating the producibility of biomass components, we were interested in applying the PM to other metabolites that could be produced by microbes and impact microbial community structure or function in the human oral microbiome. We thus used our method to compute the PM of various organic acids across oral microbiome organisms. We analyzed nine different organic acids and observed a large amount of variability in PM (Figure 4—figure supplement 4). Acetate had the highest median PM while butyrate had variable PM, with most organisms having PM of 0 but some having relatively high PM. In particular, Fusobacterium genus organisms were found to have high PM for butyrate, reflecting observations obtained from transcriptomic data, with important implications for periodontal disease (Jorth et al., 2014). Additionally, increased butyrate PM was observed in some but not all Porphyromonas and Prevotella species, which have been further implicated in periodontal disease due to their potential production of inflammation inducing organic acids (Takahashi, 2015). For reference, the organic acids analyzed in this section were added to Supplementary file 5, and the calculated PMs were added to Supplementary file 6.

Metabolite producibility in a protein vs. carbohydrate-enriched environment

Although one of the most useful features of our method is the capacity to provide an environment-independent measure of metabolite producibility, it can also be tailored to ask environment-specific questions. To exemplify this capability, we applied our method to investigate the biosynthetic capabilities of a proteolytic organism (Porphyromonas gingivalis) and a saccharolytic organism (Streptococcus mutans) in a protein and a carbohydrate-enriched environment. The hypothesis was that the proteolytic organism would have a higher PM increase in the protein enriched environment as it is able to breakdown amino acids to synthesize other biomass components and likewise the saccharolytic organism would have a higher PM increase in the carbohydrate-enriched environment. We simulated a protein-enriched environment by fixing all amino acids to always be present (Pin = 1) when calculating the PM, and simulated a carbohydrate-enriched environment by fixing D-glucose to always be present (Pin = 1). Target metabolites were never fixed to be present; for example when calculating the PM of an amino acid in the protein-enriched environment we did not fix that amino acid to be present. We measured the increase in PM in the enriched environments relative to the originally calculated PM, for all 88 biomass metabolites and nine organic acids (Figure 4—figure supplement 5). Overall, we saw only small increases in PM in the enriched environments, with particularly small increases in the carbohydrate-enriched environment. The modest trends that we identified matched our expectation, with the proteolytic organism showing a larger increase in PM in the protein-rich environment and the saccharolytic organism showing a larger increase in PM in the carbohydrate-enriched environment. One possible reason for the small effects observed in this analysis is the fact that our baseline random environment is fairly rich. For example, fixing D-glucose to be available in the carbohydrate-enriched environment had minimal effect as D-glucose already had a high PM in the original random environment. However, this application does highlight the value of further exploring variants of our method that explicitly translate environmental information into non-uniform metabolite input probabilities.

Metabolic similarity correlates with microbial co-occurrence in the human oral microbiome

Our approach is a bottom-up approach, that starts from genomes and predicts metabolic capabilities that could underlie interactions. A key question in the field of metabolic modeling is whether these bottom-up metrics can be compared to and provide insight into top-down analyses of large datasets, such as the patterns of co-occurrence of microbial taxa from microbiome sequencing data. To address this question for our approach, we used the PM to calculate pairwise measures of metabolic difference or complementarity between any two organisms and assessed the correlation of these metrics with microbial co-occurrence. Through this analysis we sought to identify metabolic trends associated with co-occurring microbes. We simultaneously evaluated the correlation between microbial co-occurrence and other previously defined metrics (Carr and Borenstein, 2012; Kreimer et al., 2012; Levy et al., 2015), so that we could compare these to the performance of the PM. While there are a few additional methods that have utilized gap-filled metabolic models to provide insight into microbial co-occurrence data (Freilich et al., 2011; Zelezniak et al., 2015), in this study we focused our direct comparison on alternative methods that could be used to analyze draft (non-gap-filled) metabolic networks as these were closest in scope and applicability to our own method. Future analyses could broaden the scope of this comparison. All of the pairwise metabolic metrics we calculated are described further in the methods section. For co-occurrence data, we analyzed the supplementary data from Friedman and Alm (2012), which contains microbial co-occurrences identified from 16S rRNA sequencing data using their SparCC method for seven different oral microbiome sites. The correlations between all pairwise metabolic metrics and microbial co-occurrence in all seven oral microbiome sites are presented in Supplementary file 7.

Across the seven different oral microbiome sites, the pairwise metabolic metric ‘PM distance’ (see methods section for description of metrics) showed the most consistent significant correlation with co-occurrence of any pairwise metabolic metric. The PM distance was consistently negatively correlated with the co-occurrence, indicating that organisms that are more similar in PM tend to co-occur. Several other pairwise metabolic metrics were found to be correlated with co-occurrence, although in a less consistent manner than the PM distance (Supplementary file 7). Additionally, many of the pairwise metabolic metrics that we analyzed were highly correlated with each other as we show in Figure 4—figure supplement 6. To further disentangle correlations between pairwise metabolic metrics and co-occurrence data, we looked at the partial-correlation between a pairwise metabolic metric and co-occurrence when controlling for another pairwise metabolic metric. We found that the PM distance always had significant partial-correlation with co-occurrence when controlling for any of the other pairwise metabolic network metrics, a trend not observed for the other metrics. We further repeated this entire correlation analysis for co-occurrence measured by Pearson’s correlation (also from the supplementary data of Friedman and Alm, 2012.), and interestingly found that correlations between pairwise metabolic metrics and co-occurrence were weaker and less consistent when using Pearson’s correlation, in line with previously reported inconsistency in co-occurrence prediction by Pearson’s correlation (Friedman and Alm, 2012). Overall, our analysis corroborates and enhances previous analyses showing how co-occurrences in 16S rRNA sequencing data from the human microbiome project tend to reflect ‘habitat filtering’, where organisms with similar metabolic capabilities tend to co-occur (Freilich et al., 2011; Zelezniak et al., 2015; Levy and Borenstein, 2013).

We next examined more closely the correlation between the pairwise metabolic metrics PM complementarity and Seed complementarity (see Methods and Figure 4—figure supplement 7). These measures of complementarity summarize the potential for any one organism to provide metabolic products to another. While the two metrics are highly correlated with each other, the distribution of their values display some significant differences. In particular, the PM complementarity displays a clear bi-modal distribution, which is absent from the distribution of Seed complementarity values. The high-valued peak of the PM complementarity distribution captures most of the interactions between small-genome/fastidious microorganisms and their partners. This indicates that our method is good at resolving biosynthetic deficiencies in fastidious/uncultivated organisms, as further investigated next.

Biosynthetic properties predicted in a cluster of fastidious human oral microbiome organisms

In addition to dissecting the patterns associated with specific metabolites, one can analyze the PM landscape of Figure 4 from the perspective of the organisms and their agglomeration into clusters. Strikingly, in our large clustered PM matrix, the most distinct hierarchical cluster of organisms consisted of a number of fastidious organisms (Figure 4 top left). This cluster included all of the Mycoplasma genomes that we analyzed, and one Treponema genome. Mycoplasma and Treponema are genera that are known to be parasitic and have evolved to have reduced genomes and metabolic capabilities (Fraser et al., 1995; Fraser et al., 1998; Meseguer et al., 2003; Davis et al., 2013; Razin, 1978). The remaining members of this cluster included nearly all of the sequenced yet uncultivated, or recently co-cultivated, organisms in our study. The organisms included were from the phyla: Absconditabacteria (SR1), Saccharibacteria (TM7), Proteobacteria (genus Desulfobulbus), and Chloroflexi (class Anaerolineae). Many of these organisms are thought to have reduced genomes and limited metabolic capabilities underlying their fastidious nature, much like Mycoplasma. Only one of the previously uncultivated organism we analyzed was found outside of this fastidious cluster, namely Tannerella HMT-286. Interestingly, this bacterium is hypothesized to rely on externally supplied siderophores to support its growth (Vartoukian et al., 2016a). This type of dependency is not captured by our metabolic analysis and highlights the fact that, while uncultivability can be driven by many different mechanisms, our method captures the prominent effect of reduced biosynthetic capacity.

We sought to gain clearer insight into the metabolic properties of these co-clustered fastidious organisms by re-clustering their PM submatrix (Figure 5 A). By comparing the PM values in this fastidious cluster to those in the average oral microbiome organisms, it is clear that the fastidious organisms had reduced PM for a large number of metabolites including cell-wall components, lipids, amino acids, and other essential factors. When ranking metabolites by their difference in average PM between all oral microbiome organisms and the fastidious cluster a number of amino acids and vitamins stand out as being the most depleted in the fastidious cluster. The top metabolites where: pyridoxal phosphate, valine, putrescine, isoleucine, bactoprenyl diphosphate, thiamin diphosphate, 5-methyltetrahydrofolate, lysine, deoxyguanosine triphosphate, tryptophan, and guanosine-triphosphate. These metabolites may be particularly relevant with regards to exchange between fastidious organisms and their oral microbiome community partners. Amino acids, in particular, have been hypothesized to be involved in metabolic exchange between microbial organisms in communities (Ponomarova and Patil, 2015; Mee et al., 2014; Mee and Wang, 2012; Zelezniak et al., 2015). Amino acids with reduced PM in the fastidious cluster tend to have high biosynthetic cost (cost calculated in Akashi and Gojobori, 2002.), as indicated by Spearman correlation analysis (ρ = 0.4595, p-value=0.0415). An exception to this trend, potentially interesting for follow up studies, is the case of the branched chain amino acids valine, and isoleucine, which are the two amino acids with most reduced PM in fastidious organisms, but are not among the costliest. Notably, branched chain amino acid supplementation has been shown to alter the metabolic structure of the gut microbiome of mice (Yang et al., 2016).

Fastidious/uncultivated and TM7/host producibility sub-matrices.

Sub-matrices of the larger PM matrix were re-clustered to highlight variations within specific groups of fastidious and uncultivated organisms. (A) The fastidious/uncultivated organisms that were identified as the most unique cluster in the larger matrix from Figure 4 were re-clustered hierarchically. The average producibility metric (PM) value across all oral microbiome organisms analyzed in this study is shown in the far left column. Differences between the fastidious Mycoplasma genus organisms and the previously uncultivated TM7 organisms are highlighted with roman numerals. (B) The PM values for the previously uncultivated TM7 organisms and their growth-supporting hosts bacteria were extracted and re-clustered hierarchically. Differences between the TM7 and their bacterial hosts are highlighted with roman numerals.

https://doi.org/10.7554/eLife.39733.018

We next sought to gain more specific insight into a specific class of recently-cultivated fastidious organisms, Saccharibacteria (TM7). To gain specific insight into the biosynthetic capabilities of these TM7 relative to other fastidious microorganisms, we further focused our analysis on identifying discrepancies between Mycoplasma and TM7. Our analysis included eight Mycoplasma genomes and three TM7 genomes. Mycoplasma are a relatively well characterized genus of intracellular parasites with reduced metabolic capabilities, and TM7 are a recently co-cultivated phylum of the candidate phyla radiation that display reduced metabolic capabilities and a parasitic lifestyle. There are several cell-wall components for which TM7 has relatively high PM values and Mycoplasma has PM values of zero (Figure 5 I). These include nine different teichoic acids, bactoprenyl diphosphate, and peptidoglycan. This highlights extensive cell-wall/peptidoglycan metabolism in TM7 organisms and the known lack of a cell-wall in Mycoplasma (Razin, 1978). Furthermore, a set of three nucleotides: dGTP, GTP, and TTP, have high PM values for TM7 and PM values of zero for Mycoplasma organisms (Figure 5 II). This pattern of nucleotide biosynthesis deficiency in Mycoplasma is consistent with the observation that some strains have been shown to be dependent on supplementation of thymidine and guanosine but not adenine or cytosine nucleobases for growth (Mitchell and Finch, 1977). Finally, the cofactors acyl carrier protein (ACP) and flavin adenine dinucleotide (FAD) had high PM values in Mycoplasma and PM values of zero in TM7 organisms (Figure 5 III). The lack of these cofactors in TM7 seems surprising, but is indeed matched by a complete lack of any metabolic reactions annotated to utilize FAD and ACP as cofactors in the draft reconstruction of the TM7 metabolic networks.

In addition to investigating the metabolic deficiencies of fastidious organisms, the PM landscape gave us the opportunity to compare these gaps with possible complementary capabilities in organisms known to support their growth. The three TM7 strains that we analyzed were recently co-cultivated with host bacteria from the human oral microbiome. TM7x was shown to be a parasitic epibiont of Actinomyces odontolyticus XH001 (McLean et al., 2016). TM7 AC001 and PM004 were recently both co-cultivated successfully with either of the host strains Pseudopropionibacterium propionicum F0230a or F0700 (Collins et al., 2019c). We further investigated these newly discovered relationships to gain insight into possible metabolic exchange (Figure 5 B). Interestingly, TM7 organisms had higher PM values than their host strains for several cell-wall components: three glucose-substituted teichoic acids, and glucose-substituted and unsubstituted glycerol teichoic acid (Figure 5 IV), suggesting that TM7 is capable of producing several cell-wall components that its host cannot. Conversely, as expected, a large number of metabolites had increased PM values in the host strains compared to the TM7 strains. These metabolites are hypothesized to be easily synthesized by the host and not TM7 and are thus interesting candidates for growth supporting exchange. Fourteen different metabolites had average PM values in the hosts greater than 0.60 higher than in the TM7 organisms (Figure 5 V). The ranked list includes: isoleucine, valine, acyl carrier protein, 5-methyltetrahydrofolate, pyridoxal phosphate, flavin adenine dinucleotide, thiamin diphopsphate, putrescine, tryptophan, Fe2+, heme, Fe3+, lysine, and menaquinone-8. Interestingly, the branched chain amino acids isoleucine and valine are again at the top of the list. The correlation of amino acid biosynthesis cost (Akashi and Gojobori, 2002) with the difference in PM values between host and TM7 is even higher than what we observed across all fastidious organisms (Spearman correlation ρ = 0.6011, p-value=0.0051) indicating that PM values are further decreased in TM7 for costly amino acids.

Our results provide context and putative mechanistic details related to observed gene expression and metabolic changes in the co-cultivation of TM7x with the host Actinomyces odontolyticus XH001 (McLean et al., 2016). Transcriptomic data for TM7x and A. odontolyticus XH001 showed that a number of genes associated with N-acetyl-D-glucosamine were up regulated in A. odontolyticus in this interaction (He et al., 2015). Our results show that, although TM7 does have extensive cell wall metabolism, A. odontolyticus has higher PM for N-acetyl-D-glucosamine substituted components (Figure 5 VI). This suggests that the host is responsible for the biosynthesis of these cell-wall components, which may be overexpressed during co-cultivation. Metabolomics experiments from this co-cultivation have identified the cyclic peptide cyclo(L-Pro-L-Val) as a potential signaling molecule in this relationship. Our PM analysis suggests that this molecule would be synthesized by the host as it has increased PM values for both of the amino acids included (Figure 5 VII). In fact, valine has one of the highest discrepancies in PM for host and TM7. Finally, another potentially exchanged amino acid of interest is arginine. All three TM7 draft metabolic network reconstructions that we analyzed were annotated to possess either all or all but one of the reactions in the arginine deiminase pathway (TM7 PM004 is missing the arginine iminohydrolase reaction) (Figure 6—figure supplement 1 and Supplementary Cytoscape (Shannon et al., 2003) files 1–3). This catabolic pathway can be used to degrade arginine to regenerate ATP, and has been implicated in syntrophic microbial interactions (Plugge and Stams, 2001; Schink, 2006). In our PM analysis arginine had consistently higher PM in host than TM7 (Figure 5 VIII). Thus, arginine exchange and metabolism via the arginine deiminase pathway could contribute to the dependence of TM7 on its hosts (Figure 6).

Figure 6 with 1 supplement see all
Hypothesized metabolic exchange between TM7 and their bacterial hosts.

(A) We summarize here hypotheses generated for the exchange of metabolites between TM7 and their growth-supporting hosts based on differences in biomass PM values. We also highlight any insight that our PM was capable of providing into experimental transcriptomic and metabolomic data from the co-cultivation of TM7x and Actinomyces odontolyticus that was previously collected and analyzed in a separate study (He et al., 2015). (B) The cell-wall components containing glucose-substituted teichoic acids were among the only metabolites with PM higher in TM7 than in hosts. N-acetyl-D-glucosamine-substituted teichoic acids had increased PM in the host relative to TM7, and previous gene expression data from TM7x and A. odontolyticus shows several genes related to N-acetyl-D-glucosamine that are overexpressed in A. odontolyticus during co-cultivation (He et al., 2015). (C) Several vitamins/cofactors/other essential factors had decreased PM in TM7 compared to the hosts. The cofactors acyl carrier protein and flavin adenine dinucleotide had decreased PM in TM7, and were also not found to be utilized in the TM7 draft metabolic network reconstructions. (D) Several amino acids had decreased PM in TM7 compared to the hosts. Valine and proline were both decreased in TM7 relative to the host, and previous metabolomics data from TM7x and A. odontolyticus identified the cyclic dipeptide cyclo(L-Pro-L-Val) as a potential signaling molecule (He et al., 2015). Arginine had decreased PM in TM7 relative to the host and could potentially be exchanged and catabolized by TM7 via the arginine deiminase pathway.

https://doi.org/10.7554/eLife.39733.019

Discussion

Our method provides an estimate of the putative biosynthetic capabilities of a metabolic network from genomic information. We first implemented this method in E. coli, to demonstrate its application and capacity to address multiple questions, even in presence of uncertainty that would prevent the use of other stoichiometric methods. Next, we reconstructed metabolic networks for 456 different organisms from the human oral microbiome, and generated an atlas of predicted biosynthetic capabilities across these organisms. We highlighted trends in the biosynthetic capabilities of these microbes related to taxonomy, and showed that these predicted biosynthetic capabilities can partially explain co-occurrence data. We further focused on describing putative biosynthetic deficiencies of a cluster of fastidious/uncultivated organisms and predicted exchanged metabolites between three recently co-cultivated Saccharibacteria (TM7) strains and their growth supporting partner microbes. Overall, our method provides preliminary insight into the metabolic capabilities of a large number of human oral microbiome organisms and helps further the understanding of the structure of this complex microbial ecosystem.

Our method differs from other approaches in several key ways that we have demonstrated throughout our analysis. First, it is different from a pathway-based analysis where the percent completion of biosynthetic pathways is analyzed. Our method is based on the entire metabolic network and captures the multitude of different routes through which a metabolite could be synthesized. An additional advantage is that our method does not rely on previously defined biosynthetic pathway annotations and instead seeks to use the entire metabolic network structure to define biosynthetic capabilities. While prior knowledge of pathways can be quite useful in many contexts, specific biosynthetic routes can cross the boundaries of annotated pathways, making pathway completeness uninformative. A prior notable example of this effect is the discovery of an alternative pathway to bypass a TCA cycle gene impairment (Frezza et al., 2011). This pathway connects in an unexpected way two distinct pathways, generating a new crucially important and experimentally validated type of connectivity that would be missed from regular pathway-based analysis. Another such example, in our current data, is the case of the arginine deiminase pathway and the urea cycle, which contain several overlapping metabolites and reactions. In fact, we have noticed that KEGG mappings of TM7x metabolism often highlight the urea cycle as a result of TM7x containing the complete arginine deiminase pathway. Our method differs substantially also from standard flux balance analysis, even if it is based on stoichiometry and Linear Programming. Specifically, our method has improved tolerance for missing reactions compared to flux balance analysis (Figure 3), and thus does not rely on gap-filled metabolic networks. Therefore, it is capable of providing preliminary insight into ‘draft’ genome-derived metabolic networks that can be used to study diverse microbes and microbial communities, and could potentially help guide the gap-filling process and predict putative biomass components. Our method also differs from alternative topology-based methods (Borenstein et al., 2008; Carr and Borenstein, 2012; Kreimer et al., 2012; Levy et al., 2015) as it represents metabolism as a bipartite graph constrained by stoichiometry (enabling enforcement of mass balance constraints), rather than projecting the network onto an adjacency matrix between metabolites, which is not constrained by stoichiometry (i.e. two metabolites can be connected in an adjacency matrix despite a missing reactant or cofactor for the reaction that connects them).

It is important to highlight the limitations of our approach. In particular, many of the issues that limit the accuracy of metabolic network analyses in general affect our method as well. The primary limitation is enzyme annotation. Aside from missing or incorrect annotations, subtle processes such as enzyme promiscuity and spontaneous reactions may have unquantified effects on metabolic network function. Reaction direction/reversibility is also difficult to predict as it requires detailed knowledge of reaction thermodynamics and metabolite concentrations. In particular, inaccurate or missing information about reaction direction/reversibility could lead to uncertainty about whether a high PM from our method should be interpreted as reflecting biosynthetic or degradative capabilities (or both). Throughout our analysis we have utilized default reversibility constraints provided by the KBase build metabolic models app (Henry et al., 2010; Arkin et al., 2018; Overbeek et al., 2014), but more stringent constraints on directionality could possibly improve our results. Transport reactions are also notoriously difficult to annotate accurately, and the current implementation of our method addresses this limitation by naïvely adding intracellular metabolites as input metabolites. However, any future efforts to use extracellular metabolites as inputs would rely on accurate transport reaction annotations. In general, all metabolic network analysis methods face similar limitations. Even as newly developed experimental methods gradually improve metabolic reaction annotation (Price et al., 2018a; Vaccaro et al., 2016; Price et al., 2018b; Sévin et al., 2017), it is likely that we will have to continue dealing with incomplete knowledge. Thus, approaches like the one presented here are valuable for providing initial predictions of metabolic capabilities with minimal arbitrary assumptions, and for pinpointing specific areas of a metabolic model that are in need of refinement. One additional limitation of our method, in comparison to alternative methods, is that it requires a longer run time than alternative methods, such as FBA (Orth et al., 2010a) or NetSeed (Borenstein et al., 2008; Carr and Borenstein, 2012). Future efforts towards simplifying the calculations to improve the algorithm’s speed would be beneficial. For example, utilizing heuristics or belief propagation could possibly improve the efficiency and run time of our algorithm (Yedidia et al., 2003).

Despite these limitations, by translating genotype into phenotype with minimal assumptions, our approach has the potential to serve as a baseline estimate of metabolic mechanisms in different microbial communities. Moving forward, our method could be easily applied to other human-associated or environmentally relevant microbial communities, providing valuable putative insight into inter-microbial metabolic dependencies. For example, in this analysis we have analyzed three previously uncultivated Saccharibacteria (TM7) phylum organisms that were recently successfully co-cultivated with growth supporting bacterial host organisms. These TM7 species are the first successfully cultured organisms from the candidate phyla radiation, a large branch of the tree of life consisting mainly of uncultivated organisms (Kantor et al., 2013; Brown et al., 2015; Hug et al., 2016), and therefore are of general interest beyond their role in human oral health. Further analysis of the candidate phyla radiation through our method could provide preliminary phenotypic insight into this unusual, but large, group of bacteria. Additionally, our method could be applied to other classes of uncultivated bacteria, many of which will be gradually added to the collection of genomes reconstructed from metagenomic sequencing of communities.

Another promising application of our approach is evaluating draft models during the metabolic network reconstruction process. In particular, in building new draft stoichiometric models, the producibility metric, which displays nuanced variability across taxa, could be used as an initial estimate of the biomass composition, to be compared to the reference biomass compositions currently used in most reconstructions (Lakshmanan et al., 2019). More generally, our approach fits into an emerging class of metabolic reconstruction and analysis methods that address uncertainty by statistically sampling ensembles (of environments, as done here; fluxes, as studied extensively [Schellenberger and Palsson, 2009]; or network reconstructions, as recently implemented [Biggs and Papin, 2017; Machado et al., 2018]). We envisage that the metabolic insight gained from the application of these methods will continue to help bridge the gap between top down studies and a mechanistic understanding of microbial community metabolism and dynamics.

Materials and methods

Method implementation

Request a detailed protocol

The framework for implementing our method was developed as several different modular functions that interact in a nested manner to run our analysis. The functions are written in MATLAB and interface with the COBRA toolbox (Schellenberger et al., 2011; Heirendt et al., 2019). The code is built around the COBRA toolbox commands changeObjective and optimizeCbModel. Thus, running our code requires installation of the COBRA toolbox. Additionally, the nonlinear fitting function utilizes the MATLAB function lsqnonlin for nonlinear least squared fitting. Additional functions were developed to implement our probabilistic framework and run our analysis method. We describe here each modular function, providing details on the computations performed. The full code for implementing our method, with examples for running the code, is available online at https://github.com/segrelab/biosynthetic_network_robustness (Bernstein, 2019).

Algorithm functions

Request a detailed protocol

feas – This function determines if the production of a given target metabolite set is feasible given the metabolic network model with specified constraints. Flux balance analysis was used to determine the feasibility of production (Orth et al., 2010a). Flux balance analysis was chosen over the alternative network expansion algorithm due to its treatment of cofactor metabolites (Kruse and Ebenhöh, 2008). In network expansion, cofactors must be added to the network to ‘bootstrap’ metabolism, whereas in flux balance analysis any reaction utilizing a cofactor can proceed given that the cofactor can be recycled by a different reaction, which is a less restrictive constraint on the metabolic network flux. Furthermore, our implementation allows for inequality or equality mass balance constraints. Traditional flux balance imposes an equality mass balance which is often referred to as a steady state constraint. This constraint restricts the rate of change of all metabolite concentrations to be equal to 0. We provide the option of implementing inequality mass balance, which constrains the rate of change of metabolite concentrations to be greater than or equal to 0. In practice, inequality mass balance is implemented by adding unbounded exporting exchange reactions and calculating steady state solutions. We have implemented inequality mass balance for all of our calculations due to the fact that we are analyzing local properties of the metabolic network (the production of a single metabolite) and do not want the network to be constrained by the global requirement to achieve steady state. During the production of a particular metabolite, the metabolic network is thus free to produce byproducts that are used elsewhere or secreted. To determine production feasibility, the export of a particular target metabolite is set to the objective function and maximized. If the maximal flux is greater than a hard-coded threshold (>0.001), then the target metabolite is considered to be feasibly produced. This function uses the COBRA commands changeObjective and optimizeCbModel to set and maximize the appropriate objective function. Mathematically, flux balance analysis is implemented as a linear programming problem with the following definition:

maximize: CTv
subject to:Sv=0 (equality mass balance);or Sv 0 (inequality mass balance)
and subject to:lbvub

Where: CT is the transpose of a column vector indicating which reactions are to be maximized. In this case, this specifies the exporting exchange reactions corresponding to the target metabolites. v is a column vector of metabolic reaction fluxes. S is the stoichiometric matrix describing the reactions present in the metabolic network (a metabolites by reactions size matrix). Each element in the matrix is the stoichiometry of a particular metabolite associated with a particular reaction. Negative values indicate that a metabolite is a reactant of that reaction being consumed, while positive values indicate that a metabolite is a product of that reaction being produced. lb and ub are the lower and upper bounds of all reactions, which define reaction reversibility or are set to -1000 and 1000 respectively when unbounded. Additional information on flux balance analysis can be found in this publication describing its implementation in detail (Orth et al., 2010a).

rand_add – This function is designed to give a random sample of input metabolites to be added based on the Bernoulli parameter for each input metabolite. This function uses the MATLAB rand function to choose a random number between 0 and 1 for each input metabolite. If this number is less than the Bernoulli parameter for that input metabolite, then the metabolite is added.

prob – This function utilizes rand_add and feas to determine the probability of producing the target metabolite given the input metabolite Bernoulli parameters, the metabolic network structure, and the specified constraints. A chosen number of random samples of input metabolites are generated by repeatedly running the rand_add function. The probability of producing the target metabolite is determined as the number of feasible trials divided by the total number of samples. The default number of samples used for the bulk of the analysis in this work was 50.

calc_PM_fit_nonlin This function calculates the producibility metric (PM) for a specified metabolic network model and metabolite using an efficient nonlinear fitting technique. The nonlinear fitting algorithm estimates the PM by randomly sampling points on the producibility curve that fall near PM. The algorithm starts by sampling a point in the middle of the producibility curve (Pin = 0.5) and then using the MATLAB function lsqnonlin to fit a sigmoidal curve to the sampled points of the producibility curve. The fit sigmoidal curve is then used to estimate a value for the PM. Next, a new sample point is obtained which is offset from the estimated PM value with some noise introduced with the specified noise parameter. In this way the algorithm converges on the PM value and samples points around PM, thus increasing the accuracy of its estimate with each iteration. The estimate converges when a specified n estimates of the PM value are all within a specified threshold. The default parameters associated with this function, used for the bulk of our analysis, were: noise = 0.3, n = 7, thresh = 0.01. The parameters chosen were selected by hand to provide reasonable performance.

prep_mod This function is used to prepare the metabolic network model for analysis with our method. The input for this function is a COBRA model, which is saved as a MATLAB structure variable. This code has been developed and optimized to work with KBase generated metabolic networks and is not guaranteed to work with networks from other sources that have different naming conventions. The first modification to the networks is to find and turn off all exchange and maintenance reactions to standardize the network models. Second, the extracellular and intracellular metabolites are identified based on naming conventions and output from the function. Third, exchange reactions are added for each metabolite (producing 1 unit of that metabolite), and a vector indicating the mapping from metabolites to these exchange reactions is output from the function. This vector is used by our method to control the presence and absence of input metabolites in the network model as well as to adjust the inequality mass balance constraints. The final output is a new network model which has been standardized for our method and in which the presence and absence of metabolites can be easily manipulated.

find_PM_mods_mets This function is designed to facilitate the parallelization of the PM calculation. The function takes as inputs a directory of metabolic network models, a directory to store results, a list of target metabolite names, the index of the current network model and metabolite being analyzed and all of the specifications necessary for running calc_PM_fit_nonlin. The metabolite and model being analyzed can be changed dynamically to allow for parallelization. In addition to these inputs, this function has several inputs that allow for standard modifications to the PM calculation procedure. It allows for certain metabolites to be fixed on or off. It allows for several choices of metabolites to be added during the PM calculation process, including adding all intracellular or extracellular metabolites and including the target metabolite or not. It also allows for specification of the inequality mass balance constraint as either all metabolites set to inequality mass balance or all metabolites set to equality mass balance. Furthermore, it has a parameter for the number of runs to calculate the PM to obtain statistics regarding the variability of calc_PM_fit_nonlin. For the analysis done in this work: calculation of PM for single metabolites was done by adding all intracellular metabolites (excluding targets), the mass balance constraint was set to use inequality constraints for all metabolites, the number of runs was set to 10.

Parallelization

Request a detailed protocol

We used the Boston University shared computing cluster to run our analysis for a large number of metabolic networks and metabolites. The calculation of the PM for each individual network model and metabolite can be run in parallel, vastly increasing the number of possible computations. The average runtime for computing the PM for an individual network and metabolite for 10 repeated runs was ~9 min and the maximum run time was ~45 min, given the default parameters used in this study: a = 0, s = 1, samp = 50, noise = 0.3, n = 7, thresh = 0.01, runs = 10. We note that these parameters were chosen by hand to provide adequate performance for our algorithm, and future implementations could possibly alter these parameters to provide improved run-time and/or accuracy.

Using the E. coli core metabolic network to demonstrate features of metabolite producibility

Request a detailed protocol

Our analysis method was initially demonstrated on the E. coli core metabolic network. We used the network provided by the BiGG database (King et al., 2016). We calculated the PM value for each intracellular metabolite. The input metabolites for our PM calculations were assigned as all intracellular metabolites in the E. coli core metabolic network. This was the most naïve assumption we could use for assigning input metabolites, and was consistently used throughout the majority of our analyses. Additionally, using intracellular metabolites as input metabolites avoids errors that could arise from poorly annotated transporters in draft metabolic network reconstructions. Calculations were performed using the Boston University shared computing cluster to parallelize runs across networks and metabolites and improve computation time. The results of our simulation were visualized using the Cytoscape network visualization software (Shannon et al., 2003). The entire E. coli core metabolic network is shown, excluding the biomass reaction for clarity.

Producibility of metabolites differs from pathway completeness and captures minimal precursor set structure

Request a detailed protocol

We analyzed the PM for the histidine biosynthetic pathway with auxotrophic metabolic networks generated by knocking out different reactions along the pathway in the E. coli iJO1366 metabolic network. The PM was calculated for all essential biomass components using default parameters. The PM for all biomass components, excluding histidine, was unchanged and the PM for histidine was reported. The PM for histidine was seen to match the theoretical values based on our combinatorial theory. The theoretical values were calculated using the formula in Figure 1—figure supplement 2 C, where n corresponds to the number of main intermediate metabolites that remain connected to the end product (L-histidine). For our analysis of the histidine pathway across all 456 oral microbiome metabolic networks, the histidine pathway sum was the total number of reactions in the histidine pathway and the pathway length was the total number of reactions starting at histidine and counting until the first missing reaction. Both values were compared to the PM values of each organism for histidine by Spearman’s rank correlation.

Producibility analysis shows improved tolerance to missing reactions compared to flux balance analysis

Request a detailed protocol

We demonstrate the performance of our method on a network with missing reactions on perturbed E. coli iJO1366 metabolic network by randomly removing reactions and observing the results of flux balance analysis (FBA) and our producibility metric (PM). The iJO1366 metabolic network consists of 2583 reactions, and was perturbed by removing n random reactions (n = 4, 16, 64, 256, 1024). The biomass reactions (full and core) and the maintenance reaction were not candidates to be randomly removed. For each value of n, 50 different randomly perturbed metabolic networks were generated (a total of 250 metabolic networks). For each of these networks, the core biomass reaction flux was optimized using FBA in a complete medium and a minimal glucose medium and the biomass flux was recorded. Next, the PM for all core biomass metabolites, excluding those with consistent PM of 0 (ex. metal ions), were calculated using our method. The accuracy of each method was calculated using a quantitative difference measure and a biomass production measure. For FBA the quantitative difference measure was calculated as the absolute value of the difference between the original biomass flux and the perturbed biomass flux. For the PM the quantitative difference measure was calculated as the sum of the absolute value of the differences between the PM value of each metabolite in the original model and the perturbed models. Note that for both FBA and the PM this measure is equivalent to the L1 norm of the difference between the original network metric and the perturbed network metric. The accuracy was measured as one minus the normalized difference measure. Means and standard errors across the 50 different randomly perturbed metabolic networks were calculated and reported. The biomass production measure was calculated in FBA as the fraction of perturbed metabolic networks that could produce biomass flux greater than 1% of the original optimal biomass flux. The biomass production measure was calculated for the PM as the fraction of perturbed metabolic networks that had PM for all biomass components analyzed above a specified threshold. Thresholds of 0.1 and 0.6 were analyzed and reported.

Metabolite producibility points to putative metabolic mechanisms for E. coli auxotroph co-cultures

Request a detailed protocol

We analyzed experimental data from E. coli auxotrophs using our PM metric. Experimental data were taken from the supplementary growth data of Wintermute and Silver (2010). The growth data used were the mean of the day 4 replicates 1 and 2. E. coli auxotrophs were modeled using the iJO1366 metabolic network. All reactions related to a gene, including those involving isoenzymes, were knocked out from the model by setting the upper and lower bound of the reaction to zero. Isoenzyme related reactions were included based on prior evidence that this improves performance of metabolic modeling of auxotrophs (Jacobs et al., 2017). The PM was calculated for all biomass components for each auxotroph metabolic network using default parameters consistent with other PM calculations in this study. The PM distance between auxotrophs was calculated as the L1 norm of the difference between two auxotrophs PM vectors. Additional details on PM distance can be found in methods section Metabolic similarity correlates with microbial co-occurrence in the human oral microbiome. The correlation between PM distance and experimentally measured growth matrices was assessed using a Mantel permutation test with 10,000 permutations, and calculating the correlation of the upper triangle of the matrices. Additional details on the Mantel permutation test can be found in the methods section Metabolic similarity correlates with microbial co-occurrence in the human oral microbiome.

Reconstruction of human oral microbiome metabolic networks

Request a detailed protocol

A set of 456 draft metabolic networks were reconstructed for oral microbiome strains. Strains were chosen to match the sequences chosen for dynamic annotation on HOMD which cover at least one strain for each sequenced species and repeated strains for sequences of particular interest for the human oral microbiome. Several strains were additionally selected due to our interest in fastidious and uncultivated organisms. These included eight uncultivated or recently co-cultivated strains. When considering the taxa TM7 and Tannerella sp. oral taxon 286, we chose to include the most recent genome sequences from co-cultivation experiments, although there are several additional single-cell and metagenome assembled sequences available for Tannerella sp. oral taxon 286 and TM7 (Kantor et al., 2013; Marcy et al., 2007; Beall et al., 2014; Albertsen et al., 2013; Podar et al., 2007). The host strains Actinomyces odontolyticus XH001, Pseudopropionibacterium propionicum F0700, and Pseudopropionibacterium propionicum F0230a were included due to their support of TM7 organisms. All genomes were either found in the KBase central data repository or manually annotated with RAST and uploaded to KBase (Arkin et al., 2018; Overbeek et al., 2014; Aziz et al., 2008; Brettin et al., 2015). Strains that were dynamically annotated on HOMD but could not be found on KBase, were not of interest due to uncultivability, and already had a representative strain from their matching species were not included in our set of strains. Several naming discrepancies existed between KBase and HOMD, which are highlighted in the KBase download notes column of Supplementary file 4. All metabolic networks were reconstructed using a KBase narrative containing all of the genomes and metabolic networks from this work, which is available to be copied, viewed, edited, or shared at https://narrative.kbase.us/narrative/ws.27853.obj.935. Metabolic networks were reconstructed for each strain with automatic assignment of Gram-stain, and without gap-filling. Metabolic network reconstructions were then downloaded from KBase as SBML files and converted to COBRA. mat files using the COBRA command readCbModel. Metadata related to all organisms and metabolic networks are available in Supplementary file 4.

Large-scale patterns in biosynthetic capabilities identified across the human oral microbiome

Request a detailed protocol

We investigated the large-scale biosynthetic properties of the human oral microbiome by analyzing reconstructed metabolic networks for 456 different oral microbiome strains. For each metabolic network we calculated the PM value for 88 individual biomass components (40,128 total PM calculations). The biomass components were chosen to be the union of the set of default KBase Gram-positive and Gram-negative biomass compositions (see Supplementary file 5 for details). The metabolites sulfate and phosphate were not included, while the metabolite H2O was included as a positive control. The calculations were parallelized across metabolic networks and metabolites using the Boston University shared computing cluster to improve computation time. The PM values were stored as a matrix of organisms by metabolites PM values. This matrix was analyzed using hierarchical clustering based on average differences between groups. The matrix was clustered and visualized using the R package pheatmap.

For the comparison of average PM values and genome size, genome size was taken from KBase and added to Supplementary file 4. We used regression modeling to identify the broad relationship between genome size, taxonomy, and the average PM value. We fit PM values to linear and quadratic models of log genome size:

Linear:average(PM)=c1+c2log(genome size)
Quadratic:average(PM)=c1+c2log(genome size)+c3log(genome size)2

Nominal taxonomic parameters were added to these models to determine if they could improve the models prediction of PM values. Gram-stain, and taxonomic labels from phylum to genus were used as nominal taxonomic parameters. For each taxonomic level, each label was added as an additional nominal parameter, for example: adding the predictor of phylum meant adding 12 independent variables, one for each different phylum. Gram-stain was assigned based on KBase default assignments. Taxonomic labels were assigned based on human oral microbiome database taxonomy annotations. Regression models were developed using the MATLAB command fitlm. The AIC and BIC were calculated to assess model improvement upon subsequent addition of taxonomic parameters using the MATLAB command aicbic.

Taxonomic trends capture biosynthetic patterns across human oral microbiome organisms

Request a detailed protocol

We investigated specific trends in metabolite PM values related to taxonomy by analyzing the clustered matrix of PM values. Additionally, a regression model was used to provide quantitative insight. The base regression model was a quadratic model using the log of genome size as the predictor of the specific PM value for a certain metabolite across all organisms:

PM(metabolite)=c1+c2log(genome size)+c3log(genome size)2

Nominal taxonomic parameters were then added one at a time. Taxonomic parameters of Gram-stain (+ or -), phylum (belonging to 1 of 12 phyla or not) and class (belonging to 1 of 22 classes or not) were used. We calculated the log likelihood ratio by taking difference between the log likelihood of the base quadratic model of genome size and the model including a specific taxonomic parameter. We identified highly significant relationships using an alpha value of 10−6 and Bonferroni correction for multiple hypothesis testing.

Organic acid production predicted for human oral microbiome organisms

Request a detailed protocol

Organic acid production was assessed by calculating the PM for nine different organic acids for each human oral microbiome organism. The organic acids analyzed were: acetate, formate, L-lactate, succinate, propionate, D-lactate, butyrate, isobutyrate, and isovalerate. The organic acids were chosen by searching through the literature for those that were found to be relevant for oral health (Jorth et al., 2014; Takahashi, 2015). While this is certainly not an exhaustive list of organic acids, it demonstrates the applicability of our method to non-biomass metabolites. The PM was calculated using default parameters consistent with other calculations in this study. The organic acids chosen where included in Supplementary file 5, and the PM results were included in Supplementary file 6.

Metabolite producibility in a protein vs. carbohydrate-enriched environment

Request a detailed protocol

The production of metabolites in variable environments was implemented in our method by re-calculating the PM for all metabolites in a protein-enriched and carbohydrate-enriched environment for two organisms: a saccharolytic organism (Streptococcus mutans UA 159) and a proteolytic organism (Porphyromonas gingivalis W83). These organisms were chosen because they are known to be associated with oral diseases involving either saccharolytic or proteolytic activity, namely dental carries and periodontitis (Wade, 2013; Takahashi, 2015). The protein-rich environment was simulated by fixing all 20 amino acids to always be present (Pin = 1), by adding them to the fixon parameter, and then adding all other metabolites randomly. As in the other PM calculations, the target metabolite is never added or fixed to be on. The carbohydrate-enriched environment was simulated in a similar manner by fixing D-glucose to always be present.

Metabolic similarity correlates with microbial co-occurrence in the human oral microbiome

Request a detailed protocol

Co-occurrence data were collected from supplementary Dataset_S1 of Friedman and Alm (2012). Seven different oral sites were analyzed and co-occurrence calculated with SparCC and Pearson’s correlation were analyzed. Various pairwise metabolic metrics where compared to the patterns of microbial co-occurrence, and significant correlations were found using a Mantel permutation test with 10,000 permutations. These pairwise metabolic metrics were likewise compared to each other using a Mantel permutation test with 1000 permutations. Additional details on metrics used and Mantel test are described below. To compare the pairwise metabolic metrics with microbial co-occurrence, we collapsed all interaction metrics to the genus level by averaging scores across species in the same genus such that we could match predictions from our method with co-occurrence based on 16S rRNA sequencing, which was mapped at best to the genus level.

Calculation of various metrics for correlation analysis:

PM distance – Calculated as the L1 norm of the difference between the PM vectors of any two organisms. The L1 norm is calculated as the sum of the absolute values of the differences of all of the elements of the vector.

PM distanceA,B= normL1(PMAPMB)=i=1n|PMiAPMiB|

PM complementarity – Calculated as the complementarity between two organisms, quantifying the amount by which one organism can supplement the PM of another organism. For organism A supplementing B, the metric is calculated by summing over i = 1 to N metabolites.

PM complementarity AB=i=1N(max[PMiA,PMiB]PMiB)i=1N(PMiA)

Seed distance Calculated as the L1 norm of the difference between the seed vectors of any two organisms. The seed vectors were calculated using the NetSeed method (Borenstein et al., 2008; Carr and Borenstein, 2012). The code used to calculate the seeds was taken from http://elbo.gs.washington.edu/software_netcooperate.html. The minComponentSize parameter was set to 0 and the onlyGiant component parameter was set to false.

Seed competition Calculated following the formula used in the NetCmpt method (Kreimer et al., 2012; Levy and Borenstein, 2013). The competition from A to B is the fraction of seeds of organism A that are also seeds of organism B. The threshold for seed vs. non-seed was a seed score of greater than 0.

Seed complementarity – Calculated following the formula used in the NetCooperate method (Levy et al., 2015; Levy and Borenstein, 2013). The complementarity from A to B is the fraction of seeds of organism A that are in the metabolic network of organism B but not seeds of organism B. This metric was calculated using code from http://elbo.gs.washington.edu/software_netcooperate.html.

Reaction distance – Calculated as the L1 norm of the difference between the reaction vectors of any two organisms. Reaction vectors were vectors of 0’s and 1’s indicating which metabolic reactions where present in the draft metabolic network of each organism.

Reaction Jaccard – Calculated as the Jaccard distance between the reaction vectors of any two organisms. The Jaccard distance is calculated as one minus the intersect of the vectors divided by the union of the vectors. In other words, it is one minus the fraction of shared metabolic reactions.

Mantel test – A Mantel test was used to assess correlation between matrices as done in Levy and Borenstein (2013). The Spearman’s rank correlation was calculated for all elements of the two matrices (excluding the diagonal). Then the first matrix was permuted 10,000 times, and the number of times the correlation was stronger than the original correlation was recorded. The p-value was calculated using the formula below, where n is the number of times the permuted correlation was stronger (absolute value of the correlation coefficient ρ was larger) than the original and N is the number of permutations.

Mantel P value= n+1N+1

Partial Mantel tests were calculated in a similar manner, but using partial correlations between the first and second matrix while controlling for a third matrix. For the partial correlation permutations, only the first matrix is permuted and the partial correlation is recalculated.

Biosynthetic properties predicted in a cluster of fastidious human oral microbiome organisms

Request a detailed protocol

A subset of fastidious organisms identified from the larger clustered matrix of all oral microbiome organisms PM values were re-clustered based on average distances and analyzed further. Additionally, three previously uncultivated TM7 organisms (TM7x, AC001, and PM004) and several host strains for the uncultivated TM7 (Actinomyces odontolyticus XH001, Pseudopropionibacterium propionicum F0700, and Pseudopropionibacterium propionicum F0230a) were re-clustered and analyzed. Metabolites were ranked and analyzed based on the difference between the average PM value of separate groups. Three different rankings were used throughout this analysis: (1) average fastidious cluster organisms PM subtracted from average oral microbiome organisms PM, (2) average Mycoplasma PM subtracted from average TM7 PM (3), average TM7 host PM subtracted from TM7 PM. Correlations between amino acid biosynthetic cost and difference in PM were calculated using Spearman’s rank correlation between the amino acid cost (Akashi and Gojobori, 2002) and the difference in average PM between average and fastidious organisms, or between hosts and TM7, using the MATLAB command corr.

References

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
    Network Science
    1. A-L Barabási
    (2015)
    Network Robustness, Network Science, Cambridge, University Press.
  7. 7
    Biosynthetic potentials from species-specific metabolic networks
    1. G Basler
    2. Z Nikoloski
    3. O Ebenhöh
    4. T Handorf
    (2008)
    Genome Informatics. International Conference on Genome Informatics 20:135–148.
  8. 8
  9. 9
  10. 10
  11. 11
  12. 12
  13. 13
  14. 14
  15. 15
  16. 16
  17. 17
  18. 18
  19. 19
  20. 20
  21. 21
  22. 22
  23. 23
  24. 24
  25. 25
    TM7 phylum sp. oral taxon 488 strain AC001 chromosome, complete genome
    1. AJ Collins
    2. P Murugkar
    3. T Chen
    4. FE Dewhirst
    (authors) (2019a)
    GenBank. ID CP040003.
  26. 26
    TM7 phylum sp. oral taxon 955 strain PM004 chromosome, complete genome
    1. AJ Collins
    2. P Murugkar
    3. T Chen
    4. FE Dewhirst
    (authors) (2019b)
    GenBank. ID CP040008.
  27. 27
    Pseudopropionibacterium propionicum strain F0700 chromosome, complete genome
    1. AJ Collins
    2. P Murugkar
    3. T Chen
    4. FE Dewhirst
    (authors) (2019c)
    GenBank. ID CP040007.
  28. 28
  29. 29
  30. 30
  31. 31
  32. 32
    Structural analysis of expanding metabolic networks
    1. O Ebenhöh
    2. T Handorf
    3. R Heinrich
    (2004)
    Genome informatics 15:35–45.
  33. 33
  34. 34
  35. 35
  36. 36
  37. 37
  38. 38
  39. 39
  40. 40
  41. 41
  42. 42
  43. 43
  44. 44
  45. 45
  46. 46
  47. 47
  48. 48
  49. 49
  50. 50
  51. 51
  52. 52
  53. 53
  54. 54
    Isolation and characterization of outer and inner membranes of selenomonas ruminantium: lipid compositions
    1. Y Kamio
    2. H Takahashi
    (1980)
    Journal of Bacteriology 141:888–898.
  55. 55
  56. 56
  57. 57
  58. 58
  59. 59
  60. 60
  61. 61
  62. 62
  63. 63
  64. 64
  65. 65
  66. 66
  67. 67
  68. 68
  69. 69
  70. 70
  71. 71
  72. 72
  73. 73
  74. 74
  75. 75
  76. 76
  77. 77
  78. 78
    Pathways of nucleotide biosynthesis in mycoplasma mycoides subsp. mycoides
    1. A Mitchell
    2. LR Finch
    (1977)
    Journal of Bacteriology 130:1047–1054.
  79. 79
  80. 80
  81. 81
  82. 82
  83. 83
    Reconstruction and use of microbial metabolic networks: the core Escherichia coli metabolic model as an educational guide
    1. JD Orth
    2. RM Fleming
    3. BØ Palsson
    (2010)
    EcoSal Plus, 4, 10.1128/ecosalplus.10.2.1, 26443778.
  84. 84
  85. 85
  86. 86
  87. 87
  88. 88
  89. 89
  90. 90
  91. 91
  92. 92
  93. 93
  94. 94
  95. 95
  96. 96
    The mycoplasmas
    1. S Razin
    (1978)
    Microbiological Reviews 42:414–470.
  97. 97
  98. 98
  99. 99
    Syntrophism among prokaryotes
    1. B Schink
    (2006)
    In: M Dworkin, S Falkow, E Rosenberg, K-H Schleifer, editors. The Prokaryotes, 2. New York: Springer. pp. 309–335.
    https://doi.org/10.1007/0-387-30742-7_11
  100. 100
  101. 101
  102. 102
  103. 103
  104. 104
  105. 105
  106. 106
  107. 107
  108. 108
  109. 109
  110. 110
  111. 111
  112. 112
  113. 113
  114. 114
  115. 115
  116. 116
  117. 117
  118. 118
    Understanding belief propagation and its generalizations
    1. JS Yedidia
    2. WT Freeman
    3. Y Weiss
    (2003)
    In: G Lakemeyer, B Nebel, editors. Exploring Artificial Intelligence in the New Millennium . San Francisco, CA, United States: Morgan Kaufmann Publishers Inc. pp. 239–269.
  119. 119
  120. 120
  121. 121
  122. 122

Decision letter

  1. Wenying Shou
    Reviewing Editor; Fred Hutchinson Cancer Research Center, United States
  2. Naama Barkai
    Senior Editor; Weizmann Institute of Science, Israel
  3. Wenying Shou
    Reviewer; Fred Hutchinson Cancer Research Center, United States
  4. Christopher Quince
    Reviewer; University of Warwick, United Kingdom

In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.

[Editors’ note: the authors were asked to provide a plan for revisions before the editors issued a final decision. What follows is the editors’ letter requesting such plan.]

Thank you for sending your article entitled "Quantifying biosynthetic network robustness across the human oral microbiome" for peer review at eLife. Your article is being evaluated by three peer reviewers, one of whom is a member of our Board of Reviewing Editors, and the evaluation is being overseen by a Reviewing Editor and Naama Barkai as the Senior Editor.

Given the list of essential revisions, including new experiments, the editors and reviewers invite you to respond with an action plan and timetable for the completion of the additional work. We plan to share your responses with the reviewers and then issue a binding recommendation.

The most important issue you will need to address is the biological relevance of the proposed method. You can either show that the method is in agreement with previously published data (where a lot of direct evidence exist) in a manner that is better/more insightful than previous methods; or provide new experimental data to substantiate your claims.

Reviewer #1:

"Quantifying biosynthetic network robustness across the human oral microbiome" from Segre lab was motivated by a desire to overcome the gap-filling limitation of standard flux balance analysis and the need to make assumptions on environments in metabolic network topology analysis. They quantified "biosynthetic network robustness", essentially how the probability of a product being made (Pout) scales with the probability of the presence of input metabolites (Pin). This robustness can be summarized as PM – the higher PM, the more likely a metabolite can be synthesized. If one is worried about parameter overfitting (and one should worry about such problems in genome scale models), this seems like a good thing to try. Applying the metric PM to E. coli central metabolism, they found that PM is not correlated with connectivity. Some metabolites have high connectivity but low PM, and some low connectivity and high PM. They then quantified PM of 88 biomass metabolites for 456 oral microbes, and found that fastidious (difficult to culture) microbes have low PM values for several metabolites, consistent with them being fastidious. Genome size and taxonomy both contained information about PM. They found that cell wall metabolites have variable PM. They also found that a number of amino acids have high PM in Proteobacteria and low PM in Bacteroidetes. They also analyzed metabolic complementarity of several TM7 strains (fastidious microbes) and their symbiotic bacteria strains. The paper is of bioinformatics nature, and is in general clearly written.

1) What is the difference between PM and% completion of biosynthetic pathway?

2) It's also not entirely clear to what extent their method improves upon the state of the art. In their Introduction, they say that two downsides that limit FBA (but not their method) are assumptions on environmental conditions and reconciling with stoichiometry-based constraints. I don't really understand their second limitation, but they sidestep the first limitation by assuming a constant Pin among all inputs. This seems similar to assuming constant concentration among all inputs for FBA, which is never shown.

Reviewer #2:

The manuscript by Bernstein et al. presents a computational method to estimate metabolites that are likely to be exchanged in a microbial community. The essence of the novelty of the method, especially in comparison with the previous methods, is that it provides a probabilistic view accounting for multiple possible variations in substrate uptake combinations (e.g. different combinations of C/N sources etc.). The method is illustrated with the analysis of oral microbial community and the results suggest potential metabolic dependencies in the community. I find the method to be elegant but yet only of theoretical interest.

1) The method assumes a uniform distribution of possible substrate uptake combinations – which is not necessarily biologically observed. There are hierarchies – e.g. preference of glucose over other C-sources in most organisms.

2) The finding that metabolic dependencies may exist in oral microbiome is not that surprising per se given that a large body of literature (including papers also from the Segre lab) already suggest that metabolic dependencies are likely widespread in microbial communities. Thus, it is not clear whether only the proposed method would be able to identify these.

3) Lack of experimental validation makes it difficult to assess whether the identified exchanges are happening in the system. At the least, a detailed analysis of systems with known metabolite exchanges (and how the proposed method there differs from the previous ones) should be provided.

In summary, I do find the proposed method elegant and appealing but miss substantiation of the biological relevance.

Reviewer #3:

This manuscript introduces a new robustness metric that attempts to quantify the probability that an organism can produce a given metabolite directly from the genome. This probability is calculated as an average over possible environments defined by the presence or absence of the other metabolites in the network with production determined through the application of an FBA model. I found the metric itself intuitively appealing and easy to understand. In addition, the manuscript was clear and well written. The application to the oral microbiome and the novel CPR organisms in particular was also well motivated and interesting.

Given the rate of generation of microbial genomes it would be very useful to have a better method to even approximately predict metabolic products of organisms. However, what the study lacks in my opinion, is quantitative evidence that the metric actually does translate into a prediction that the organism produces a given metabolite. They provide some anecdotal examples where it appears to work but without a more thorough analysis of what the metric means then it will only be useful in a qualitative sense i.e. a higher PM indicating that one organism is more likely to produce a metabolite than another. Metabolic modelling is admittedly not my field, but datasets capable of testing the metric, must exist. A possible example might be this collection of E. coli strains and phenotypes defined in terms of growth conditions (Galardini et al., 2017, eLife). If such datasets are not available then I feel the authors need to be more circumspect in their conclusions and concede that the metric may say something about the production of metabolites but without any certainty about how a given PM value translates into an actual probability of production.

Other than the above caveat, I thought this was a good study but it was sparse on the practical details of the methodology. The GitHub repository consists of a selection of Matlab scripts without a clear description of how to apply them. What would be really useful is a complete walkthrough of the methodology from genome to PM values including for example ggkbase commands used to build the models. This is necessary if this work is to be properly reproducible. One final comment was that some evaluation of the impact of sample number on the calculated PM values would be a worthwhile addition. The authors mention that 50 samples of input metabolite combinations were used but no analysis of the impact of this choice on the PM values was given.

[Editors’ note: formal revisions were requested, following approval of the authors’ plan of action.]

Thank you for submitting a revision plan for your article entitled "Quantifying biosynthetic network robustness across the human oral microbiome". The editors and reviewers have considered your plan and invite you to proceed with your revisions. Please also consider the following comments when preparing your revised manuscript.

The Reviewing Editor stated:

"I think that I know how authors can fix their problems from a theoretical aspect. They can start with a complete model as the "ground truth" model, but hide or mis-annotate various% links to mimic our incomplete knowledge of the system, and compare their method with existing methods. This way, they will always have a "ground truth" curve for any metric of interest to biologists. Of course, the task of assigning missing links will need to be repeated many times and in double-blind fashion to gain statistical power. I suspect that different methods might show tradeoff. For example, in their Figure 1, it is not clear whether the extra info provided by PM is useful to a biologist if all mutants are auxotrophic. Any tradeoff between methods is also valuable to know. Authors can also use this ground truth model to illustrate problems of gap filling etc."

Another reviewer said:

"I agree this is a very comprehensive plan and they do seem to have taken our comments to heart and developed a proposal to address them. I was pleased with their suggestions regarding an alternate dataset of E. coli phenotypes to test their metric and the improved GitHub repository. "

The last reviewer said:

"I agree that the proposal is comprehensive and I like the Reviewing Editor's simulation idea. This also reminds me that the authors' should comment on the ensembleFBA approach which does address the main criticisms of the gap-filling. Furthermore, I think that a clear focus on oral microbiome would benefit the study much more rather than claiming general validity that is very difficult to substantiate based on their proposed plan – after all, E. coli is one organism and by no means representative of natural diversity and community complexity. E. coli auxotroph complementation experiments are also unique in some sense as I haven't seen many other studies showing similar data in other organisms. So, overall, I think recommending to focus on oral microbiome and reducing the general claims would put the paper on more solid foundation I believe."

https://doi.org/10.7554/eLife.39733.030

Author response

[Editors’ note: what follows is the authors’ plan to address the revisions.]

Thank you very much for evaluating our manuscript, and for your valuable and constructive comments. In the summary letter, the reviewers and reviewing editor have asked us to further investigate and clarify the novelty and biological relevance of our method, especially in relation to previously developed approaches. In particular, they have asked us to “either show that the method is in agreement with previously published data (where a lot of direct evidence exist) in a manner that is better/more insightful than previous methods; or provide new experimental data to substantiate [our] claims”.

We agree that a better description of the biological relevance of our method and additional analyses directly addressing how our method can provide novel insight relative to existing approaches will greatly strengthen our results and clarify the importance of our work. We propose several new analyses that will address the reviewers’ concerns. Based on revisiting the literature, and implementing some preliminary analyses (shown below), we believe that sufficient published data is available to clarify the biological relevance and distinct capabilities of our method.

The current document outlines our plan for addressing the reviewers’ comments, and is structured as follows:

In Part 1, we present an overview of how we will address each of the major issues raised across all reviewers. In Part 2, we have specifically addressed all reviewer’s comments in more detail. In Part 3, we have included preliminary and prospective figures that will be added to our manuscript to address the reviewer’s comments.

We obviously cannot guarantee that all comparisons with available data will confirm the validity of our approach. However, some preliminary analyses we have performed so far, displayed below in the form of tentative figures, support and clarify the relevance of our method. Would any of the prospective analyses we have suggested yield uninformative or negative results, we would still be committed to including them in a revised manuscript as an opportunity to highlight the limitations of our approach, and possible directions for future improvement.

Part 1 – Summary of how we address major issues

A) It is not clear how our method is different from and better/more insightful than previous methods (reviewer #1 comment 1, reviewer #1 comment 2, reviewer #2 comment 2).

We appreciate this criticism and see it as an opportunity to clarify the novelty of our method and point out where it may be better/more insightful than other similar methods. We will address this criticism by describing how our method differs from several different methods:

How our method differs from pathway completeness: We will first analyze the relationship between pathway completeness (the fraction of genes or reactions from a given pathway found in a metabolic network) and our Producibility Metric (PM) in depth for a single pathway. Figure 1 displays a preliminary analysis, showing how our metric provides a much more informative outcome than pathway completeness for the analysis of mutations in the E. coli L-histidine biosynthesis pathway. In particular, the PM provides information on the structure of the pathway and on how the proximity of the knockout to the final product affects its producibility. Thus, a high PM value would indicate that the target metabolite can be produced from multiple distinct sets of precursors. This analysis will be presented as a supplemental analysis in the Materials and methods section of our manuscript.

To show the validity of this argument beyond a simple linear pathway and perform a more comprehensive analysis, we will also compare the relationship between pathway completeness and our PM for additional metabolites with known biosynthetic pathways and interesting distributions of PM across our set of oral microbiome organisms. This analysis will be presented in the results section of our manuscript. See also response to reviewer #1 comment 1 for additional details on these analyses.

How our method differs from flux balance analysis (FBA): We will clarify how our method differs from classical FBA, in three major ways. First, in our approach we assess the producibility of individual biomass components, rather than optimizing the overall biomass production flux. This allows us to analyze incomplete networks that would be simply infeasible with FBA optimization of the biomass production flux. Second, we define environmental availability of precursors in a probabilistic way, and thus compute metabolite producibility based on random sampling. Third, we provide a novel measure of robustness (the Producibility Metric) that cannot be computed with standard FBA, which describes how well an organism can produce a metabolite across a statistical ensemble of possible environments.

In the revised manuscript, in addition to describing in detail these differences and their implications, we will add a new figure supplement to our method section (see proposed Figure 2 below as Author response image 2) showing how indeed our metric provides robust predictions of biomass producibility for metabolic networks whose gaps would prevent obtaining any useful information from regular FBA (which would give simply “infeasible” as an outcome). Additionally, to clarify the connection between our probabilistic definition of environments and the constraint-based definition of FBA, we will further compare our analysis with FBA for E. coli grown on minimal glucose and rich media (see also point B below). In the discussion of a revised manuscript we would also better explain that having to perform gap-filling in order to use FBA for any microbiome study is becoming an increasingly important and debated issue. While previous work has shown the utility of gapfilled models for predicting inter-species interactions (Zelezniak et al.,2015; Freilich et al., 2011), more recent papers have raised concerns about existing gap-filling pipelines (Karp et al., 2018, BMC Systems Biology). In general, we feel that exploring alternative strategies that can estimate metabolic properties of microbes and communities without gap-filling is going to be a crucial component of future genome-scale microbial modeling efforts.

How our method differs from NetCooperate: We will compare our method in more detail to what we believe to be the most closely related method, in terms of scope and motivation, namely the NetCooperate reverse ecology method. Similar to our method, this approach can be used to analyze metabolic interactions in microbial communities using draft metabolic networks and to provide large scale insight into microbial communities (Levy and Borenstein, 2013; Levy etal., 2015). To perform a comparison with this method, we will use our method to compute a complementarity score (which quantifies the potential for metabolic exchange between two organisms) and compare this metric to an analogous metric derived from NetCooperate. We will compare the two approaches for all pairs of oral microbiome organisms in our collection. A tentative new figure (proposed Figure 3 below as Author response image 3) shows preliminary results from such an analysis. This figure will be highlighted in a supplementary results section of our revised manuscript. One can see that the two metrics are significantly correlated. Interestingly, however, our metric provides a better capacity to discriminate interactions that involve fastidious/uncultivated organisms. In finalizing this analysis, we will further check whether the capacity of our method to resolve uncultivated organisms is related to the more specific predictions of exchanged metabolites that our method provides relative to NetCooperate, likely due fact that we use stoichiometric constraints for our calculations, as opposed to network adjacency. To clarify why our analysis of the full stoichiometric network is expected to yield more specific prediction than those obtained by analyzing a metabolite adjacency network we propose to add a new portion of text, possibly aided by a figure supplement in the Materials and methods section. More details on this are provided in the response to reviewer #1 comment 2.

B) The assumption of a constant Pin throughout our analysis seems limiting (reviewer #1 comment 2, reviewer #2 comment 1).

We wish to start by acknowledging that, while in our manuscript we described in detail the mathematical definitions of Pin (the probability that an environmental metabolite is available) and of PM (the Producibility Metric), we could have done a better job at providing a biological intuition for our definitions. We will update our manuscript to include an improved description of PM, Pin, and their relationship to FBA, which we outline tentatively in what follows.

One way of briefly re-stating the interpretation of the Producibility Metric (PM) for a given metabolite is that it is an estimate of the fraction of distinct environmental compounds that one would have to add in the medium in order to expect, on average, a high chance (above 0.5) of being able to produce a given metabolite. We should clarify that the computation of PM is effectively the outcome of thousands of FBA calculations in which the production of a target metabolite is estimated under many sets of input metabolites randomly sampled as Pin slides between 0 and 1. In this sense, in our current calculations, Pin is uniform (i.e. assumed to be the same across all metabolites), but not constant (it changes between 0 and 1). Note also that while the PM of a given metabolite is the outcome of many FBA calculations, it is a number that could not easily be related to any individual FBA calculation. In summary, PM quantifies the overall propensity for an organism (with an incomplete metabolic network, and no knowledge about its typical environment) to be able to produce a given metabolite. Therefore, our method is particularly useful for assessing the biosynthetic capabilities of a large number of organisms without having to make any arbitrary assumption beyond the metabolic information present in the organisms’ genomes.

That being said, nothing in our algorithm prevents one from making alternative choices for the values of Pin, which can be different for different metabolites. In fact, our algorithm was designed in a modular fashion such that it could accommodate such alternative implementations. An extreme, but illustrative, scenario is the one of Pin = either 0 or 1 for all metabolites. We will show that this scenario would effectively be equivalent to estimating the producibility of biomass components with standard FBA, under a specific environmental condition, in which the available metabolites are the ones with Pin = 1. We will further show how these extreme choices of Pin differ from using intermediate Pin values and PM by showing how the results of metabolite producibility in a minimal and rich medium are related to the PM. This analysis is described in further detail in response to reviewer #1 comment 2.

In order to address the specific suggestion to use non-uniform Pin values to probe nutrient preferences across different organisms, we will test the ability of our approach to distinguish between saccharolytic vs. proteolytic organisms in the human oral microbiome. These two classes of organisms have very different metabolic strategies and important implications in oral health (Takahashi, 2015). For this analysis, we will use variable Pin values to define a sugar rich and an amino acid rich environment. Then we will calculate the producibility metric for different biomass components for a set of saccharolytic and proteolytic organisms from our collection of oral microbiome organisms and assess our ability to distinguish their environmental preferences. This analysis will be added to the results section of our manuscript in a new section addressing variable Pin values. For additional details see prospective Figure 4 legend below.

C) There is a lack of experimental validation about producible metabolites and exchange among microbial organisms (reviewer #2 comment 3, reviewer #3 comment 1).

In general, experiments that directly confirm the exchange of a specific metabolites between two microbes are quite challenging, let alone experiments that could uncover the entire possible network of metabolic exchanges in a microbial community. However, a few published examples contain enough information to enable a direct comparison between computational and empirical outcomes. We propose to analyze different sources of existing data in order to validate the predictions of our algorithm. Whenever possible, during these analyses, we will also include analysis based on the methods mentioned in part A such that we can compare the performance of our method relative to these alternative methods. The specific sets of case studies we plan to focus on are described below:

We will analyze organic acid exchange in the oral microbiome: We will test our method’s ability to predict the exchange of organic acids between oral microbes in two known examples. To do this we will extend our method to predict the producibility of metabolites beyond biomass components such as organic acids. The specific interactions we will focus on are the exchange of lactic acid, produced by Streptococcus and consumed by Veillonella species (Chalmers et al., 2008); and the exchange of succinate from T. denticola to P. gingivalis and isobutyric acid from P. gingivalis to T. denticola (Grenier, 1992, Infect. Immun.).Additionally, following the analysis of these specific experimentally validated systems, we will extend our analysis to all of our oral microbiome organisms to generate hypotheses regarding exchange of organic acids between previously undiscovered pairs of organisms. This analysis will be presented in the manuscript as a supplemental result that expands on our current matrix of biomass producibility. See the response to reviewer #2 comment 3 for further details.

We will analyze the exchange of amino acids in E. coli auxotrophs: As a slightly more complex example of a microbial community with metabolic exchange we propose to analyze a dataset on putative exchange among amino acid auxotrophs (Mee et al., 2014). We will calculate the producibility metric for all amino acids across pairs and triplets of E. coli auxotrophs that are known to support each other through syntrophic exchange of amino acids. We will determine if indeed the production of the necessary amino acids to support partner growth is predicted by our method. This analysis will be added to the manuscript method section as an additional validation. See prospective Figure 5 legend and the response to reviewer #2 comment 3 for further details.

While the above analyses are focused on specific interactions for which molecular mechanistic details can be addressed with our methods, it will be also interesting to assess the performance of our method on microbial co-occurrence data from the human oral microbiome. It is important to note that co-occurrences are not necessarily indicative of metabolic interactions, but they are often related to metabolic trends, as shown before (Levy and Borenstein, 2013). We will utilize two different sources of microbial cooccurrence data for this analysis. The first will be high-throughput sequencing data from the human microbiome project, which contains co-occurrence data for microbes associated with several different human oral microbiome sites (Huttenhower et al., 2012, Nature). The second source we will aim at utilizing is a set of CLASI-FISH experiments of dental plaque where physical interactions between different taxa have been imaged through fluorescence microscopy (Welch et al., 2016). This analysis will be added to the results section of our manuscript. See response to reviewer #2 comment 3 and prospective Figure 6 legend for further details.

An additional analysis that we will pursue, as inspired by reviewer #3 comment 1, is the assessment of our methods performance on predicting high-throughput phenotype data. As pointed out by the reviewer, analyzing high-throughput phenotype data could allow us to more quantitatively benchmark the ability of our method to predict specific metabolic auxotrophies (or inabilities to produce certain metabolites). We have identified a promising dataset that contains a large number of E. coli knockout mutants tested on different environments containing variable nutrients (Tohsato et al., 2010, J. Bioinform. Comput. Biol.). We have contacted the authors of this paper in the hope that they will share the raw data of their study with us, in which case we would pursue an analysis wherein we analyze this data to provide a more-high throughput and quantitative clarification of our PM calculations. This point is further elaborated in our specific response to reviewer #3 comment 1.

Part 2 – Point by point response to reviewers

Reviewer #1:

[…]

1) What is the difference between PM and% completion of biosynthetic pathway?

This is an important question that we propose to address through two different analyses. The first is a detailed analysis of a known (linear) biosynthetic pathway, biosynthesis of the amino acid L-histidine in E. coli. In a preliminary analysis (see proposed Figure 1 in Author response image 1 below) we analyzed this biosynthetic pathway by “knocking-out” different metabolic reactions and calculating our Producibility Metric (PM). One can see that the PM distinguishes between the effects of removing different reactions along the pathway based on their proximity to the target metabolite, L-histidine. Thus the PM metric is able to capture the impact of the specific location of the knockout and the topology of the metabolic pathway, thus carrying information about the chance that additionally available metabolites could recover the function of the broken pathway. In contrast, in several other possible approaches (including pathway completeness, as well as flux balance analysis, or network expansion) we expect the outcome of such a knock-out analysis to yield the same final output irrespective of the gene location, and thus provide a different and less informative prediction. For the full revision, we will show this analysis in more detail, and compare its outcome with the other methods mentioned above.

The second analysis that we propose goes beyond a simple linear pathway, and constitutes a more general assessment of the relationship between PM and biosynthetic pathway% completeness. For this analysis we will utilize our large set of oral microbiome organisms. For several metabolites with well described biosynthetic pathways, such as amino acids, we will determine the% completion across these strains and make a direct comparison with the PM value. We do expect to see some correlation between the% completion of a pathway and the PM value. However, as demonstrated with the above analysis, we do not expect them to be redundant.

We would also like to emphasize that we believe it is an advantage of our method that it does not rely on previously defined biosynthetic pathway annotations and instead seeks to use the entire metabolic network structure to define biosynthetic capabilities. While this prior knowledge can be quite useful in many contexts, specific biosynthetic routes can cross the boundaries of annotated pathways, making pathway completeness uninformative. A prior notable example of this effect is the discovery of an alternative pathway to bypass a TCA cycle gene impairment (Frezza et al.,2011). This pathway connects in an unexpected way two distinct pathways, generating a new crucially important and experimentally validated type of connectivity that would be missed from regular pathway-based analysis. Another such example, in our current data, is the case of the arginine deiminase pathway and the urea cycle, which contain several overlapping metabolites and reactions. In fact, we have noticed that KEGG mappings of TM7x metabolism often highlight the urea cycle as a result of TM7x containing the complete arginine deiminase pathway. In the revised manuscript we will describe this advantage of the Producibility Metric relative to pathway completeness, and point to the specific examples mentioned.

2) It's also not entirely clear to what extent their method improves upon the state of the art. In their Introduction, they say that two downsides that limit FBA (but not their method) are assumptions on environmental conditions and reconciling with stoichiometry-based constraints. I don't really understand their second limitation, but they sidestep the first limitation by assuming a constant Pin among all inputs. This seems similar to assuming constant concentration among all inputs for FBA, which is never shown.

We agree with the reviewer that the current description of how our method differs from FBA is not articulated clearly enough, and can be significantly improved. We have described in detail how we plan to address this issue in Part 1 Point B above, and will add further details to our manuscript to describe our method, and in particular the PM, Pin and how they are related to FBA.

The first of the two “downsides” mentioned in the reviewer’s comment is related to a specific choice on how our approach can be employed for obtaining biological insight about an organism in the absence of specific information about the molecular composition of its environment. In a given FBA calculation, one necessarily needs to decide on a specific set of initial conditions, in the form of upper bounds to the fluxes associated with import of extracellular metabolites. Our method is instead statistical in nature, and computes the producibility of different metabolites as a function of the probability of availability of any given environmental metabolite (Pin). Indeed, in the current version of the manuscript, we focused on a uniform Pin across all metabolites, essentially mimicking a uniformly distributed ensemble of environments. However, in general, each metabolite j can have a different value Pin(j). An extreme case is the one of all Pin = 1, which would be comparable to running FBA with all environmental metabolites available. In a revised version of the manuscript, we propose to show some specific examples to clarify how our method chooses statistical ensembles of input metabolites based on Pin and how this choice differs from, but is related to, setting constant inputs for FBA. We will calculate the set of producible metabolites in the E. coli core model by using FBA under two different conditions, one where we include all input metabolites, as in a rich medium, and one where we include only input metabolites expected to be found in a glucose minimal medium. We note that these two scenarios correspond respectively to: (a) Rich Medium – setting Pin values equal to 1 for all metabolites; (b) Minimal Glucose Medium – setting Pin values equal to 1 for all metabolites found in glucose minimal medium and 0 for all others. We will then compare the metabolites that are producible in those two scenarios to the PM values obtained from varying Pin between 0 and 1. We expect that metabolites that are producible in the minimal medium will have higher PM values than those that are producible only in the rich medium.

The second “downside” of other methods relative to our approach is related to the fact that our methods use of a full stoichiometric matrix to constrain results, as opposed to other approaches that are based on an adjacency matrix of metabolites. We realize that our explanation was not very clear, and we will make sure to clearly articulate this point in a revised version, which will include a figure supplement in which stoichiometry and adjacency are clearly compared to each other. In particular, in response to reviewer requests we will compare the predictions of our method with those of the NetCooperate method which utilizes an adjacency matrix (Levy et al.,2015; Levy and Borenstein, 2013). We will describe how our results differ from this method in terms of sensitivity and specificity. More specifically, we expect our method to be more specific in its predictions of exchanged metabolites and ability to discriminate different types of interactions than the NetCooperate method. Some results supporting this point are shown in preliminary Figure 3 (Author response image 3) below.

Reviewer #2:

[…]

1) The method assumes a uniform distribution of possible substrate uptake combinations – which is not necessarily biologically observed. There are hierarchies – e.g. preference of glucose over other C-sources in most organisms.

This is a very interesting point. We first would like to clarify that the assumption of uniform Pin for our analysis was chosen specifically in order to assess environmental robustness across a uniformly sampled ensemble (see also Part 1, Point B above). Our method, however, can easily be implemented using variable Pin values that represent any assumed prior distribution on input metabolites. We propose an analysis that will utilize variable Pin values in order to further demonstrate the utility of our method for a specific biological example where variable assumptions about the environments may be of interest. We will analyze saccharolytic vs. proteolytic organisms from our set of oral microbiome organisms, which have key differing metabolic capabilities and implications in oral health and disease (Takahashi, 2015). Using variable Pin values, we will define a sugar rich and an amino acid rich environment, and we will calculate the PM of all biomass components for both saccharolytic and proteolytic organisms in both. The sugar-rich environment will have sugars with fixed Pin = 1 (i.e. they are always included in the environment) and the amino acid rich environment will have Pin = 1 for amino acids. By fixing Pin for these metabolites the PM will then become a measure of how many metabolites, beyond those that are fixed on, need to be added to the environment to produce a given target metabolite. These results will allow us to determine whether our method can capture the environmental preference and metabolic processes of these two classes of organisms from the human oral microbiome. This analysis is further described in prospective Figure 4, which has yet to be generated.

2) The finding that metabolic dependencies may exist in oral microbiome is not that surprising per se given that a large body of literature (including papers also from the Segre lab) already suggest that metabolic dependencies are likely widespread in microbial communities. Thus, it is not clear whether only the proposed method would be able to identify these.

Indeed, as expressed by the reviewer, a lot of literature exists about computational and empirical reports of inter-species dependencies in microbial communities. However, we believe that the reviewer will agree that what has been done uncovers with limited fidelity a minuscule fraction of the interactions likely present in nature. The challenge that remains is in characterizing and describing the mechanisms of these dependencies/interactions. We will add a statement emphasizing this point. Beyond this clarification, we will be happy to address more specifically the reviewer’s concern, i.e. show that our method can provide insight that is substantially different from the type of insight provided by other methods. We will compare our method to results obtained from three different alternative methods: (1) Pathway completeness, (2) Flux balance analysis, and (3) NetCooperate. These comparisons are, in part, addressed in the responses to other reviewer comments, and are described in detail in Part 1 point A, which outlines this major issue. We also note that, to our knowledge, ours is the first large-scale metabolic analysis of this type in the human oral microbiome. In this sense it provides valuable new insight into this microbial community and can further serve as a blueprint for analyzing other complex microbial communities.

3) Lack of experimental validation makes it difficult to assess whether the identified exchanges are happening in the system. At the least, a detailed analysis of systems with known metabolite exchanges (and how the proposed method there differs from the previous ones) should be provided.

This is a good point and we will attempt to address it as best we can. Unfortunately, there are limited systems where microbial metabolic dependencies and interactions have been unraveled experimentally. One example where there is some experimental knowledge that we will analyze is the exchange of organic acids in the oral microbiome. One such example is the exchange of lactic acid produced by Streptococcus and consumed by Veillonella species (Chalmers et al., 2008, J. Bact.). Another example is the exchange of succinate from T. denticola to P. gingivalis and isobutyric acid from P. gingivalis to T. denticola (Grenier, 1992, Infect. immun.). To analyze these examples, we will extend our method to not only calculate PM values for essential biomass components but also to include possible secreted metabolites such as organic acids. We will then utilize our method to calculate the PM for these different organisms and identify metabolic complements. Through this analysis we will confirm whether or not our method is able to capture these well-known microbial interactions. Furthermore, we will extend this analysis to calculate the PM of a set of possibly exchanged organic acids for all oral microbiome organisms in our catalog in order to gain broad insight into organic acid producibility and potentially predict new metabolic exchanges.

Another analysis we propose to perform is the study of a slightly more complicated microbial ecosystem, consisting of E. coli amino acid auxotrophs (Mee et al., 2014). In this work all pairs of E. coli single amino acid auxotrophs and all triplets of double auxotrophs where grown in co-culture to identify pairs and triplets that enabled syntrophic growth. We will re-analyze these E. coli auxotrophs and calculate their PM for all amino acids to determine if our method can predict the increased ability of growth-supporting auxotrophs to produce the amino acids that their syntrophic partners are lacking. We anticipate that certain amino acid auxotrophs, on top of being unable to produce a single specific amino acid, will have decreased robustness in their production of other amino acids that have similar biosynthetic pathways. This will lead to decreased syntrophic growth between auxotrophs that cannot robustly synthesize each other’s essential amino acids, and can be predicted by our PM calculations. For example, upon first inspection of the data from Mee et al. in Figure 1C we can see this pattern with regards to the L-methionine and L-cysteine auxotrophs (L-methionine and L-cysteine are the two sulfur containing amino acids and have similar biosynthetic pathways). While the L-methionine auxotroph has improved syntrophic growth with a number of other auxotrophs it has notably little syntrophic growth with the L-cysteine auxotroph. A further analysis of this data and comparison to our PM calculations will be presented in prospective Figure 5 (see legend below), to be included in the methods section of our manuscript as an additional validation of our method.

We will also analyze the performance of our metric in describing microbial co-occurrence patterns across the human oral microbiome. Although co-occurrence does not necessarily imply interactions, and mechanisms of interaction are not known for these cases, previous methods have been shown to be capable of identifying metabolic trends associated with microbial co-occurrence (Levy and Borenstein, 2013). We will use our method to analyze two different type of co-occurrence data and attempt to similarly identify metabolic trends. The first set of co-occurrence data is from sequencing data from the human microbiome project (Huttenhower et al., 2012, Nature). The human microbiome project included 9 different oral sites from which we will collect microbial co-occurrence data that can be compared to the oral microbiome strains that we have analyzed. The second form of co-occurrence data is from physical associations which have been observed through fluorescence microscopy (Welch et al., 2016). In this work the authors identified structure in supragingival plaque microbial communities from the oral microbiome. These structured communities suggest interactions that can similarly be compared with our PM metric to identify metabolic trends associated with microbial co-occurrence. The results of this analysis will be presented in prospective Figure 6 and included in a supplementary results section of our revised manuscript.

Reviewer #3:

[…]

Given the rate of generation of microbial genomes it would be very useful to have a better method to even approximately predict metabolic products of organisms. However, what the study lacks in my opinion, is quantitative evidence that the metric actually does translate into a prediction that the organism produces a given metabolite. They provide some anecdotal examples where it appears to work but without a more thorough analysis of what the metric means then it will only be useful in a qualitative sense i.e. a higher PM indicating that one organism is more likely to produce a metabolite than another. Metabolic modelling is admittedly not my field, but datasets capable of testing the metric, must exist. A possible example might be this collection of E coli strains and phenotypes defined in terms of growth conditions (Galardini et al., 2017, eLife). If such datasets are not available, then I feel the authors need to be more circumspect in their conclusions and concede that the metric may say something about the production of metabolites but without any certainty about how a given PM value translates into an actual probability of production.

We appreciate the reviewer’s comment, which made us realize that a number of subtle but important aspects of our analysis were not clearly explained in the manuscript. First of all, we recognize that we can do a better job at providing an intuition for the meaning of the PM metric, including a clarification that it is not meant to quantitatively capture rates of reactions (fluxes) (as done by FBA) but rather the fraction of environmental metabolites that an organism is expected to need in order to produce with high probability a given metabolite. Additional details of the intuition behind our definition are provided in Part 1, point B. We will further emphasize, in our revised manuscript, the nature of PM as a measure of robustness rather than a probability of production.

The most important concern expressed by the reviewer, however, seems to be the insufficient comparison with experimental data, which can lead one to legitimately wonder how accurate and valuable our predictions are. Irrespective of the outcome of the additional analyses we propose to perform for validating our predictions (see Part 1, point C above, and more details below), we will certainly take to heart the reviewer’s suggestion to more explicitly describe the possible limitations of our inferences. At the same time, this will also give us an opportunity to emphasize that, relative to existing methods, our approach provides a very unbiased estimate of the metabolic capabilities of a given organisms given its genome, avoiding some of the issues of gap-filling algorithms used for flux balance models.

In considering the reviewer’s recommendation of additional comparisons with experimental data, we have identified a few datasets whose analysis would enable a more substantial validation of our approach. These include a number of specific interactions between oral microbes, mediated by the exchange of organic acids, and a fairly rich dataset on amino acid exchange between E. coli auxotrophs. Both datasets are described in some more detail in response to reviewer #2 comment 3.

We also appreciate the reviewer’s suggestion of a high-throughput dataset that could be used to additionally assess our method, and we thank the reviewer for suggesting the interesting paper included. Unfortunately, the majority of the conditions tested in this paper are undefined or non-metabolic (mainly consisting of antibiotics or other stressors), and it is therefore not the best dataset for evaluating our method. In the spirit of the reviewer’s comments, we would like to suggest a similar high-throughput analysis using data from a different E. coli mutant phenotyping study that we identified (Tohsato et al., 2010, J. Bioinform. Comput. Biol.). This dataset includes phenotype data from 300 E. coli K12 mutants from the Keio collection grown on 1920 different conditions using Biolog Phenotypic Microarrays. Many of these conditions are metabolic in nature consisting of minimal media supplemented with different carbon, nitrogen, phosphorus, or sulfur sources. We have contacted the authors of this paper in the hope that they will share the raw data of their study with us, in which case we would pursue an analysis wherein we analyze this data to provide a more-high throughput clarification of PM and how exactly it relates to environmental robustness. In particular, we anticipate that E. coli knockouts with high PM for all essential biomass components will survive on a large number of variable nutrient environments, whereas knockouts with low PM for all essential biomass components will survive on a very small number of variable nutrient environments. Furthermore, analysis of the specific knockout/environment pairs could give us insight into the ability of our method to predict specific auxotrophies and metabolic requirements.

Other than the above caveat, I thought this was a good study but it was sparse on the practical details of the methodology. The GitHub repository consists of a selection of Matlab scripts without a clear description of how to apply them. What would be really useful is a complete walkthrough of the methodology from genome to PM values including for example ggkbase commands used to build the models. This is necessary if this work is to be properly reproducible. One final comment was that some evaluation of the impact of sample number on the calculated PM values would be a worthwhile addition. The authors mention that 50 samples of input metabolite combinations were used but no analysis of the impact of this choice on the PM values was given.

This is a good point and we appreciate the reviewer’s effort to help us make our method more reproducible. We will add to the GitHub repository example scripts that walk the readers through how to implement our algorithm and obtain our results. In addition to these example scripts, we will add a supplementary methods section that contains a procedural walk through on how to go from a genome to metabolic predictions using KBase (http://kbase.us) to reconstruct a model and our method to analyze it. Regarding the parameter choice for the algorithm, we note that those numbers were chosen by hand to improve algorithm performance. We will further describe the process of choosing these parameters and the impact on PM calculations including means and standard deviations of results.

PART 3 – Preliminary and prospective figures

Author response image 1
Proposed Figure 1.

Analysis of our Producibility Metric (PM) for individual gene deletions in the L-histidine biosynthetic pathway in E. coli. Note that the PM varies for all gene deletions tested, while other metrics such as pathway completeness, FBA, or network expansion would yield the same results for all knockouts. For pathway completeness the results would be 8 of 9 reactions present, and for FBA/network expansion Lhistidine would not be producible from an externally supplied metabolite fed into the top of the pathway. (A) The linear pathway was analyzed by knocking out individual reactions (squares) at different distances from L-histidine (red circle). (B) The PM was calculated for L-histidine and all other biomass components in the E. coli metabolic network, and it can be seen that the distance of the knock-out from L-histidine significantly affects its PM other biomass components are not affected. (C) The resulting PM from different knockouts (blue circles) was compared to the theoretical value for the PM of a metabolite with 1 to 9 individual precursors using our combinatorial theory (x’s) and was found to match. This result indicates that the additional co-factors utilized in this pathway (gray circles) have minimal impact on the final PM of Lhistidine.

These results will be further elaborated and included in a new figure in a supplementary methods section of our revised manuscript.

Author response image 2
Proposed Figure 2.

Comparison of the performance of our method relative to FBA in predicting metabolic properties of an organism (E. coli) as an increasing number of random gaps in the metabolic network are introduced..

The ability of the E. coli iJO1366 metabolic network to produce biomass under different conditions and constraints was analyzed for networks impacted by an increasing number of random perturbations that remove reactions in the network. Reactions were removed sequentially from the metabolic network. The resulting biosynthetic capabilities (on a rich medium) are plotted for the average of 10 different random runs, with error bars indicating standard error across runs. Flux balance analysis (FBA) calculations of the biomass flux become infeasible fairly quickly, whereas our metabolite producibility metric extends far beyond the FBA infeasibility region

An updated version of this figure will be included in a supplementary methods section of our revised manuscript.

Author response image 3
Proposed Figure 3.

Comparison of metabolic complementarity computed as an Advantage Metric based on our producibility metric (PM) vs. the NetCooperate complementarity score.

Complementarity metrics were calculated to summarize the potential for any one organism to provide metabolic products to another. The “advantage metric” was calculated using our PM metric that describes the metabolic advantage one organism can provide to another. This metric was plotted against an analogous complementarity metric calculated using NetCooperate for all directed pairs of 456 oral microbiome organisms. Log-scale histograms display the distribution of complementarities for each method. The histograms show that the advantage metric has a bi-modality that distinguishes between interactions involving fastidious/uncultivated organisms from those that do not, while the NetCooperate complementarity does not display a similar distribution.

The formula for the advantage metric is described below for organism A to B. The metric is calculated by summing over i = 1 to N metabolites.

A→B=maxPMiA,PMiB-PMiB∑i=1NPMiA

A subsequent version of this figure will be added to a supplementary results section of our revised manuscript as Figure 4—figure supplement 7.

Proposed Figure 4. To be generated. Analysis of variable Pin values for saccharolytic and proteolytic oral microbiome organisms.

This figure will display the ability of our method to utilize variable Pin values to provide biological insight into a set of organisms that may prefer a specific environment. An analysis of the PM values calculated for a set of saccharolytic and another set of proteolytic oral microbiome organisms in either a sugar or amino acid rich environment that has been altered by changing Pin values will be presented.

This figure will be added to a supplementary results section of our revised manuscript.

Proposed Figure 5. To be generated. Analysis of E. coli amino acid auxotrophs.

This figure will display the ability of our method to predict metabolic interactions in a system with known metabolic dependencies. PM calculations for E. coli amino acid auxotrophs will be compared to data showing syntrophic growth among pairs and triplets of auxotrophs.

This figure will be added to a supplementary methods section of our revised manuscript as additional validation.

Proposed Figure 6. To be generated. Analysis of co-occurrence in the oral microbiome.

This figure will display the ability of our method to identify metabolic trends in microbial co-occurrence networks. Co-occurrence networks will be taken from two distinct sources. The first data source will be cooccurrence from sequencing data from the human microbiome project. The second will be physical associations from fluorescence microscopy data. We will further display how our metric, describing metabolic properties, can be used to describe metabolic trends in these co-occurrence networks.

This figure will be added to a supplementary results section of our revised manuscript.

[Editors' note: the authors’ plan for revisions was approved and the authors made a formal revised submission.]

Section 1 (Response to final recommendations)

The Reviewing Editor stated:

"I think that I know how authors can fix their problems from a theoretical aspect. They can start with a complete model as the "ground truth" model, but hide or mis-annotate various% links to mimic our incomplete knowledge of the system, and compare their method with existing methods. This way, they will always have a "ground truth" curve for any metric of interest to biologists. Of course, the task of assigning missing links will need to be repeated many times and in double-blind fashion to gain statistical power. I suspect that different methods might show tradeoff. For example, in their Figure 1, it is not clear whether the extra info provided by PM is useful to a biologist if all mutants are auxotrophic. Any tradeoff between methods is also valuable to know. Authors can also use this ground truth model to illustrate problems of gap filling etc."

We thank the Reviewing Editor for this suggestion, which we addressed by analyzing a well curated metabolic network (“ground truth”) that has been degraded to different degrees (“miss-annotated”). This new analysis is presented in the main text, in a new section of the Results titled “Producibility analysis shows improved tolerance to missing reactions compared to flux balance analysis”. This section is accompanied by a newly added main figure (Figure 3). For this analysis, we used the well-curated E. coli iJO1366 metabolic network as our ground truth model, which we perturbed to different degrees by randomly removing reactions. This was repeated 50 times for 5 different levels of random reaction removal generating 250 different degraded metabolic networks. We analyzed these networks with FBA and with our producibility metric. This allowed us to show that the producibility metric is more tolerant of removed reactions in its predictions than FBA. Thus, this analysis illustrates that the producibility metric provides useful information on growth capabilities for draft metabolic networks that would need to be gap-filled in order to be analyzed with FBA.

Another reviewer said:

"I agree this is a very comprehensive plan and they do seem to have taken our comments to heart and developed a proposal to address them. I was pleased with their suggestions regarding an alternate dataset of E. coli phenotypes to test their metric and the improved GitHub repository."

As described in Section 2, in response to reviewer #3’s comments 1 and 2, we have addressed in detail the comments made in the original review letter, expanding on our initial plan for a revision. This includes a number of new analyses including several that relate our method to experimental phenotypic data, both in E. coli (Results section “Metabolite producibility points to putative metabolic mechanisms for E. coli auxotroph co-cultures”) and in the human oral microbiome (Results section “Organic acid production predicted for human oral microbiome organisms”, and Results section “Metabolic similarity correlates with microbial co-occurrence in the human oral microbiome”). Additionally, we have improved the GitHub repository by adding a detailed example of how to run our code.

The last reviewer said:

"I agree that the proposal is comprehensive and I like the Reviewing Editor's simulation idea. This also reminds me that the authors' should comment on the ensembleFBA approach which does address the main criticisms of the gap-filling. Furthermore, I think that a clear focus on oral microbiome would benefit the study much more rather than claiming general validity that is very difficult to substantiate based on their proposed plan – after all, E. coli is one organism and by no means representative of natural diversity and community complexity. E. coli auxotroph complementation experiments are also unique in some sense as I haven't seen many other studies showing similar data in other organisms. So, overall, I think recommending to focus on oral microbiome and reducing the general claims would put the paper on more solid foundation I believe."

We appreciated this comment and recommendation from the reviewer. We gladly included reference to the ensembleFBA approach (Biggs et al., 2015), as well as to the newly published CarveMe method that is also capable of ensemble model reconstruction (Machado et al., 2018). We agree that these approaches constitute promising methods for addressing the gap-filling problem, and we are happy to mention them in the Introduction of the paper.

Regarding the more general comment on remaining focused on the oral microbiome and reducing our general claims, we greatly appreciate this advice. In balancing other reviewers’ comments with this suggestion, we have implemented some illustrative examples in E. coli, including an analysis of E. coli auxotroph co-cultures and the Reviewing Editor's simulation idea in E. coli; however, we have focused most other new analyses on oral microbiome organisms. New analyses focused on the oral microbiome include: (i) observing the relationship between the completeness of the histidine biosynthetic pathway and our metric (Results section “Producibility of metabolites differs from pathway completeness and captures minimal precursor set structure”), (ii) investigating organic acid production (Results section “Organic acid production predicted for human oral microbiome organisms”), (iii) implementing our method in an environment specific manner with the analysis of a saccharolytic and a proteolytic organism (Results section “Metabolite producibility in a protein vs. carbohydrate enriched environment”), and (iv) associating our metric with co-occurrence from 16S rRNA sequencing data from the human oral microbiome (Results section “Metabolic similarity correlates with microbial co-occurrence in the human oral microbiome”). In all of these analyses we have sought to relate our findings with phenotypes that are relevant for the human oral microbiome whenever possible, trying to candidly illustrate what works well and what doesn’t. Additionally, we feel that the revised manuscript provides a better explanation for why and how our approach is particularly useful for addressing microbial community questions in cases where the specific metabolic composition of the environment is highly uncertain, such as in the study of natural human-associated microbiomes. This clarifies our motivation for focusing on the human oral microbiome in this study and provides insight into which scenarios our method is best suited for in the future.

Section 2 (Response to initial reviewer comments)

Reviewer #1:

[…]

1) What is the difference between PM and% completion of biosynthetic pathway?

This is an important question, which we have addressed through new analyses added to the Results section “Producibility of metabolites differs from pathway completeness and captures minimal precursor set structure” and Figure 2—figure supplement 2. We have additionally introduced a new paragraph in the Discussion addressing the difference between our method and alternative methods including biosynthetic pathway-based analysis. The specific changes are described in detail below:

The first analysis we performed was focused on looking in detail at the biosynthetic pathway for the amino acid histidine in E. coli. We focused on this pathway because it is overall linear, and thus more easily interpretable. We used the well curated E. coli iJO1366 metabolic network and we examined this biosynthetic pathway by “knocking-out” different metabolic reactions along the pathway and calculating our Producibility Metric (PM). We observed that the PM was able to capture the impact of the specific location of the knockout and the topology of the metabolic pathway. In contrast, looking at the% completion of the biosynthetic pathway would not provide this information.

A second analysis extended this concept to all 456 oral microbiome metabolic networks. Specifically, we calculated both the% completion of the histidine biosynthetic pathway (total number of reactions present) and the length (distance from histidine before a reaction is missing). We found that both of these metrics are correlated with the PM for histidine. However, the length of the intact portion of the biosynthetic pathway is more strongly correlated with the PM than with the% completion, corroborating, across a large set of metabolically diverse organisms, the result obtained for E. coli.

A deeper, and theoretically motivated portion was additionally introduced in the end of this newly added Results section, to further clarify how the PM differs from% completion of a biosynthetic pathway. We found that the PM for histidine of different knock-outs matched closely with our theoretical predictions based on combinatorics. In this new section, we point out that the combinatorial equations establish an intimate connection between the PM and the minimal precursor set structure of a particular target metabolite, and thus captures a complex measure of the multiplicity of biosynthetic routes through which the target metabolite can be produced. This theoretical analysis points out a profound meaning of the PM, which could be further explored in future work.

Finally, we added a section to the Discussion, to emphasize that our method has another important advantage over% completion: it does not rely on previously defined biosynthetic pathway annotations. While this prior knowledge can be quite useful in many contexts, specific biosynthetic routes can cross the boundaries of annotated pathways, making pathway completeness uninformative. We have highlighted a few scenarios from the literature and in our own analysis where this is the case, and where less biased network based approaches have an advantage.

2) It's also not entirely clear to what extent their method improves upon the state of the art. In their Introduction, they say that two downsides that limit FBA (but not their method) are assumptions on environmental conditions and reconciling with stoichiometry-based constraints. I don't really understand their second limitation, but they sidestep the first limitation by assuming a constant Pin among all inputs. This seems similar to assuming constant concentration among all inputs for FBA, which is never shown.

We agree with the reviewer’s comment that our original version of the manuscript did not clearly articulate how our method improves upon the state of the art. In the revised manuscript we have made changes to the Introduction and Discussion that clarify the novelty of our method, and we have added a number of analyses to the Results section where our method is directly compared with alternative methods.

The main novelty of our method, i.e. the statistical nature of the percolation approach, is now reflected in a modified Title and Abstract, and better explained in a revised, more straightforward version of Figure 1 (the original Figure 1 is still available as Figure 1—figure supplement 1). Furthermore, we updated Figure 2B to more clearly demonstrate the features of the PM; specifically, that it is independent from node degree and that recycled co-factors have minimal impact on PM. To better explain in general terms how our method improves upon the state of the art, we significantly altered the text of our Introduction. In this revised Introduction, we revisit the first limitation mentioned by the reviewer, by clarifying the rationale of assumptions on environmental conditions, i.e. the ability of our method to provide predictions in scenarios where the chemical environment of the microbes is unknown. We also note that our choice of constant Pin was chosen as the most naïve choice by which we could statistically sample the environment. This approach is different than providing all nutrients with uniform uptake rate in regular FBA, because it samples an ensemble of environments, rather than assuming a single uniformly rich one. Additionally, for reviewer #2 comment 1, we have implemented an analysis where we use non-uniform Pin to demonstrate the capacity of our method to incorporate environmental assumptions (Results section “Metabolite producibility in a protein vs. carbohydrate enriched environment”). Regarding the second limitation mentioned by the reviewer, about stoichiometry-based constraints, we discuss this in more detail in a modified portion of the Discussion, where we highlight the difference between networks defined by reaction stoichiometry vs. connectivity.

The reviewer’s concern about improving upon the state of the art was further addressed in detail by adding to the Results section several new analyses in which we compare our method with alternative ones. In addition to the comparison of our method with percent pathway completion, discussed in detail in the previous comment, we implemented the following new comparative analyses:

i) As described also in Summary Comment 1, a comparison with FBA, including an analysis of how FBA and the PM cope with a gradually increasing number of perturbations of a genome-scale metabolic network (mimicking the situation of non-gap-filled draft reconstructions), was addressed in the new section “Producibility analysis shows improved tolerance to missing reactions compared to flux balance analysis” of the Results and Figure 3.

ii) We also compared our method to several different metabolic network analysis methods in the context of microbial co-occurrence data in the new Results section “Metabolic similarity correlates with microbial co-occurrence in the human oral microbiome” and the newly added Figure 4—figure supplement 6. We showed that an inter-organism distance based on our PM score explains co-occurrences better than scores based on alternative methods we tested (as shown in the newly added Supplementary file 4). We further show that our approach more precisely separates fastidious/uncultivated microbes than an alternative approach in Figure 4—figure supplement 7. Further comments on how our method differs from and improves upon other potential approaches, as well as an extended discussion of limitations, have been added to two paragraphs in the Discussion.

Reviewer #2:

[…]

1) The method assumes a uniform distribution of possible substrate uptake combinations – which is not necessarily biologically observed. There are hierarchies – e.g. preference of glucose over other C-sources in most organisms.

We thank the reviewer for raising this point, which gave us the opportunity to better clarify two important aspects of our analysis. First, the assumption of uniform Pin for our analysis was chosen specifically in order to assess biosynthetic capabilities across a uniformly sampled ensemble of environments, i.e. under the assumption of minimal knowledge about the chemical composition of the growth medium. We have emphasized this point in the revised Abstract, mentioned it in the Introduction, and clarified it in the revised “Analysis Method” section and the newly added Figure 1.

Second, while in the previous version of our manuscript we indeed only focused on the case of a uniform Pin, we were glad to clarify and exemplify in the revised manuscript the feasibility of arbitrarily chosen, metabolite specific set of Pin values, which encode assumptions about the environmental composition. In particular, to demonstrate this capability, we have recalculated the producibility metric (PM) for one saccharolytic and one proteolytic organism in a protein enriched environment and carbohydrate enriched environment. While we found only modest increases in the PM in both of the enriched environments, we did find that the producibility metric was further increased for the proteolytic organism in the protein enriched environment than it was for the saccharolytic organism. This analysis demonstrates how we can tailor our method to use variable Pin values to ask interesting questions about an organism’s environmental dependencies. The analysis is presented as newly added Results section “Metabolite producibility in a protein vs. carbohydrate enriched environment” and Figure 4—figure supplement 5.

2) The finding that metabolic dependencies may exist in oral microbiome is not that surprising per se given that a large body of literature (incl. papers also from the Segre lab) already suggest that metabolic dependencies are likely widespread in microbial communities. Thus, it is not clear whether only the proposed method would be able to identify these.

Indeed, as expressed by the reviewer, the novelty of our work is not so much in pointing out the general possibility of inter-species metabolic exchanges, but rather in (i) the capacity of our method to produce predictions in a way that is specific to the underlying genome and molecular products, but at the same time robust to uncertainty in environmental composition and to missing gene annotations; and (ii) the production of a detailed atlas of biosynthetic capabilities and complementarities for a specific ecosystem of high interest for multiple health-related studies, namely the human oral microbiome. To our knowledge, ours is the first large-scale metabolic analysis of this type in the human oral microbiome. In this sense, it provides valuable new insight into this microbial community and can further serve as a blueprint for analyzing other complex microbial communities.

In the revised manuscript, however, in line with the reviewer’s comment, we made sure to better present our results in the context of previous analyses. In particular, in addition to more clearly stating the objectives of our work in the Introduction, we reference and describe in more detail prior literature on computational and empirical analyses of inter-species dependencies in microbial communities. Beyond this clarification, we have also added a number of analyses that compare our method to different alternative methods including: pathway completeness (Results section “Producibility of metabolites differs from pathway completeness and captures minimal precursor set structure”, Figure 2—figure supplement 2), flux balance analysis (Results section “Producibility analysis shows improved tolerance to missing reactions compared to flux balance analysis”, Figure 3), and alternative metabolic network analysis methods, such as NetCooperate used in a similar analysis (Levy and Borenstein, 2013), through an analysis of microbial co-occurrence data from the human oral microbiome (Results section “Metabolic similarity correlates with microbial co-occurrence in the human oral microbiome”, Supplementary file 4, Figure 4—figure supplement 6, Figure 4—figure supplement 7).

3) Lack of experimental validation makes it difficult to assess whether the identified exchanges are happening in the system. At the least, a detailed analysis of systems with known metabolite exchanges (and how the proposed method there differs from the previous ones) should be provided.

This is an excellent point. We are grateful to the reviewer for the suggestion, which prompted us to perform a number of analyses to assess the capacity of our method to relate to experimental measurements, and to compare our approach with previously published ones. We believe that this process ended up strengthening our manuscript significantly.

First, we examined an experimental dataset on microbial co-culture growth. This new analysis is described in a new section of the Results titled “Metabolite producibility points to putative metabolic mechanisms for E. coli auxotroph co-cultures”, Figure 3—figure supplement 1. For this new analysis, we used E. coli auxotroph co-culture data from Wintermute and Silver (2010). This dataset has the advantage of using genetically well-defined E. coli strains with specific known auxotrophies, such that corresponding genome scale models can be appropriately constructed, and metabolic exchanges can be inferred with reasonable confidence. We analyzed this system in detail to provide insight into the predictive capacity of our method. The comparison of model predictions with experimental data led to a number of observations, reported in the manuscript, and summarized briefly as follows: (i) we found that for several auxotrophs in the tryptophan pathway, the PM value for tryptophan was correlated with the auxotrophs’ ability to be supplemented by other auxotrophs; (ii) we observed that metabolically similar organisms (as measured by their PM difference across all biomass components) tend not to supplement each other’s growth; (iii) in the new section we also point out some exceptions to the trends mentioned above, including increased synergistic growth for at least one set of auxotrophs with knockouts in the same biosynthetic pathway, which highlights the complexity of this experimental data and the limitations of our method.

In a second newly added analysis, we utilized our metric, as well as previously published metrics, to describe microbial co-occurrence patterns across the human oral microbiome. The results of these analyses are presented in a new Results section titled “Metabolic similarity correlates with microbial co-occurrence in the human oral microbiome”, accompanied by a newly added Supplementary file 4. Although co-occurrence does not necessarily imply interactions, and mechanisms of interaction are generally unknown, previous methods have been shown to be capable of identifying metabolic trends associated with microbial co-occurrence in the human microbiome (Levy and Borenstein, 2013) We used our method to analyze co-occurrence from metagenomics data from 7 different oral sites in the human microbiome project. Co-occurrence data calculated using both the SparCC method and Pearson’s correlation were taken from Friedman and Alm (2012). We compared our method with a number of different metabolic network analysis methods, including NetSeed, NetCooperate, and NetCmpt used in Levy and Borenstein, as well as simple metrics based on metabolic reaction distance and Jaccard’s distance. We found that our metabolic network metric, based on the difference in PM between two organisms for all biomass components, was the most consistently anticorrelated with co-occurrence data (irrespective of co-occurrence estimate method), showing that similar organisms tend to co-occur. This result corroborates the habitat filtering hypothesis previously proposed by Levy and Borenstein. We also found that our method, and other methods, correlated more consistently with co-occurrence inferred from the SparCC method than those inferred from Pearson’s correlation, which further corroborates the claims of Friedman and Alm that Pearson’s correlations can fail to capture true co-occurrence in compositional data. Additionally, we found that our PM based metric maintained correlation with co-occurrence when controlling for other metrics through partial correlations while the inverse was not always true. This analysis highlights our metric’s ability to capture metabolic similarities between organisms, that correlate with co-occurrence, beyond what is captured by existing metabolic network analysis methods.

Finally, we extended our calculation of PM to include organic acid production. This analysis is presented in a new Results section titled “Organic acid production predicted for human oral microbiome organisms”, Figure 4—figure supplement 4. Through this analysis, we used our approach to determine whether it could predict organic acid production in human oral microbiome organisms. In particular, we were able to identify trends in organic acid production by different genera of oral microbiome organisms. We found that the Fusobacteria genera had increased PM for butyrate, a result with important implications in oral health, and which is supported by independent transcriptomic data from the literature (Jorth et al., 2014). Additionally, we found that some species of the Porphyromonas and Prevotella genera, which have been implicated in periodontal disease due to production of organic acids (Takahashi, 2015), had increased PM for butyrate as well. These results, while not directly proving exchange of a given metabolite across a specific organism pair, provide valuable information about putative secretions that could mediate these interactions.

Reviewer #3:

[…]

Given the rate of generation of microbial genomes it would be very useful to have a better method to even approximately predict metabolic products of organisms. However, what the study lacks in my opinion, is quantitative evidence that the metric actually does translate into a prediction that the organism produces a given metabolite. They provide some anecdotal examples where it appears to work but without a more thorough analysis of what the metric means then it will only be useful in a qualitative sense i.e. a higher PM indicating that one organism is more likely to produce a metabolite than another. Metabolic modelling is admittedly not my field, but datasets capable of testing the metric, must exist. A possible example might be this collection of E. coli strains and phenotypes defined in terms of growth conditions (Galardini et al., 2017, eLife). If such datasets are not available then I feel the authors need to be more circumspect in their conclusions and concede that the metric may say something about the production of metabolites but without any certainty about how a given PM value translates into an actual probability of production.

We appreciate the reviewer’s comment, which raises a very good point. The most important concern expressed by the reviewer seems to be the insufficient comparison with experimental data, which can lead one to legitimately wonder how accurate and valuable our predictions are. Throughout our response to all reviewer comments, we have taken to heart this concern, by implementing a number of new analyses that compare our predictions with different kinds of experimental data, as well as with previously published approaches. In presenting these new analyses, we have shown the strengths of our approach, but we have also explicitly described the possible limitations of our inferences. We believe that these additional analyses will provide the readers with a clearer perspective of the type of microbial ecology questions for which our method can provide valuable insight.

The first analysis we performed to illustrate the value of our method was a comparison with the simpler approach of quantifying the percent completion of a pathway as a measure of biosynthetic capability. This analysis was also described in further detail for reviewer #1 comment 1, and is presented in the Results section “Producibility of metabolites differs from pathway completeness and captures minimal precursor set structure”, Figure 2—figure supplement 2. For this analysis, we analyzed the histidine biosynthetic pathway in E. coli in detail, calculating both the Producibility Metric (PM) and the% completion of the biosynthetic pathway for different knockouts along the linear pathway. Through this analysis we were able to illustrate the difference between the PM and a simpler pathway based analysis, giving additional insight into the meaning of the PM.

Next, we compared our method with flux balance analysis (FBA) for a genome-scale metabolic network with varying degrees of perturbations generated by randomly removing reactions from the network. This analysis is described in further detail in the Reviewing Editor’s summary comment and is presented in the Results section “Producibility analysis shows improved tolerance to missing reactions compared to flux balance analysis” and newly added main text Figure 3. Through this analysis we demonstrated that the PM is more robust to missing reactions than traditional FBA and can therefore provide additional insight into draft metabolic networks with gaps. Again, this analysis provided additional context to the meaning of the PM and its applicability for analyzing metabolic networks.

Next, we introduced several new analyses that compare our PM predictions to experimental data. These analyses provide insight into how the PM can be used to analyze and interpret biological data. The first of these was an analysis of E. coli auxotroph co-cultures presented in Results section “Metabolite producibility points to putative metabolic mechanisms for E. coli auxotroph co-cultures”, Figure 3—figure supplement 1. In this section, we use the PM to analyze co-culture synergistic growth data of different E. coli auxotrophs from Wintermute and Silver (2010). The comparison of PM predictions with experimental data led to a number of observations, reported in the manuscript, and summarized briefly as follows: (i) we found that for several auxotrophs in the tryptophan pathway, the PM value for tryptophan was correlated with the auxotrophs’ ability to be supplemented by other auxotrophs; (ii) we observed that metabolically similar organisms (as measured by their PM difference across all biomass components) tend not to supplement each other’s growth; (iii) in the new section we also point out some exceptions to the trends mentioned above, including increased synergistic growth for at least one set of auxotrophs with knockouts in the same biosynthetic pathway, which highlights the complexity of this experimental data and the limitations of our method.

We next extended our analysis of the human oral microbiome to predict organic acid production. This analysis is presented in Results section “Organic acid production predicted for human oral microbiome organisms”, Figure 4—figure supplement 4. We were able to identify trends in organic acid production by different genera of oral microbiome organisms. We found that the Fusobacteria genera had increased PM for butyrate, a result with important implications in oral health, and which is supported by independent transcriptomic data (Jorth et al., 2014). Additionally, we found that some species of the Porphyromonas and Prevotella genera, which have been implicated in periodontal disease due to production of organic acids (Takahashi, 2015), had increased PM for butyrate as well.

Finally, we added a large analysis of microbial co-occurrence data from metagenomic sequencing data. This analysis is presented in the new Results section “Metabolic similarity correlates with microbial co-occurrence in the human oral microbiome”, Supplementary file 4. In this analysis, we analyzed co-occurrence data from Friedman and Alm (2012) and calculated the correlation of co-occurrence with various inter-organism metabolic metrics calculated using our PM and alternative methods. We showed that an inter-organism distance based on our PM score explains co-occurrences better than scores based on any alternative method that we tested. Furthermore, there was a strong negative correlation between PM based inter-organism distance and co-occurrence indicating that organisms with similar biosynthetic capabilities tend to co-occur. Our finding corroborates and strengthens previous similar analyses of co-occurrence data from the human microbiome (Levy and Borenstein, 2013), and demonstrates the ability of our method to provide insight into large-scale top down data such as microbial co-occurrence from metagenomic sequencing.

We also appreciate the reviewer’s suggestion of a paper (and corresponding high-throughput dataset) that could be in principle used to additionally assess our method. Unfortunately, the majority of the conditions tested in this paper are undefined or non-metabolic (mainly consisting of antibiotics or other stressors), and it is therefore not the best dataset for evaluating our method. The alternative option we had considered, using a different dataset of similar type (from Tohsato et al., 2010, J. Bioinform. Comput. Biol.), ended up not being fruitful either, as we were unable to obtain this dataset. While this would have been a nice addition, we hope that the large number of new analyses we ended up performing to address this and other concerns will be deemed satisfactory.

Other than the above caveat, I thought this was a good study but it was sparse on the practical details of the methodology. The GitHub repository consists of a selection of Matlab scripts without a clear description of how to apply them. What would be really useful is a complete walkthrough of the methodology from genome to PM values including for example ggkbase commands used to build the models. This is necessary if this work is to be properly reproducible. One final comment was that some evaluation of the impact of sample number on the calculated PM values would be a worthwhile addition. The authors mention that 50 samples of input metabolite combinations were used but no analysis of the impact of this choice on the PM values was given.

This is a good point and we appreciate the reviewer’s effort to help us make our method more reproducible. We have added an example script to the GitHub page that can be run to calculate the producibility metric (PM) for a small subset of metabolic networks and metabolites. We also include code that could be used to calculate the PM for the 88 biomass metabolites and 456 oral microbiome metabolic networks analyzed throughout the paper. However, we note that this would take a long time and is best performed in parallel on a shared computing cluster, as described in the Materials and methods section.

Regarding the parameter choices, we note that these parameters were chosen by hand to provide adequate performance for our algorithm, which we have further emphasized in the Materials and methods section of our manuscript.

https://doi.org/10.7554/eLife.39733.031

Article and author information

Author details

  1. David B Bernstein

    1. Department of Biomedical Engineering, Boston University, Boston, United States
    2. Biological Design Center, Boston University, Boston, United States
    Contribution
    Conceptualization, Data curation, Software, Formal analysis, Investigation, Visualization, Methodology, Writing—original draft
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-6091-4021
  2. Floyd E Dewhirst

    1. The Forsyth Institute, Cambridge, United States
    2. Harvard School of Dental Medicine, Boston, United States
    Contribution
    Conceptualization, Supervision, Funding acquisition, Investigation, Writing—review and editing
    Competing interests
    No competing interests declared
  3. Daniel Segrè

    1. Department of Biomedical Engineering, Boston University, Boston, United States
    2. Biological Design Center, Boston University, Boston, United States
    3. Bioinformatics Program, Boston University, Boston, United States
    4. Department of Biology, Boston University, Boston, United States
    5. Department of Physics, Boston University, Boston, United States
    Contribution
    Conceptualization, Supervision, Funding acquisition, Investigation, Writing—original draft, Project administration
    For correspondence
    dsegre@bu.edu
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-4859-1914

Funding

National Institute of Dental and Craniofacial Research (R37DE016937)

  • Floyd E Dewhirst

National Institute of General Medical Sciences (R01GM121950)

  • Daniel Segrè

Defense Advanced Research Projects Agency (HR0011-15-C-0091)

  • Daniel Segrè

Biological and Environmental Research (DE-SC0012627)

  • Daniel Segrè

National Institute of Dental and Craniofacial Research (R01DE024468)

  • Floyd E Dewhirst
  • Daniel Segrè

National Institute of General Medical Sciences (T32GM008764)

  • David B Bernstein

National Science Foundation (1457695)

  • Daniel Segrè

National Science Foundation (NSFOCE-BSF 1635070)

  • Daniel Segrè

Human Frontier Science Program (RGP0020/2016)

  • Daniel Segrè

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We would like to acknowledge our collaborators at the Forsyth institute for providing valuable knowledge on the human oral microbiome and uncultivated microorganisms. In particular, we would like to acknowledge Andrew J Collins and Pallavi P Murugkar for providing genome information on their TM7 strains AC001 and PM004 in coculture with Pseudopropionibacterium propionicum ahead of their work being accepted for publication. We would also like to acknowledge all of the members of the Segrè lab and Daniel Sher of Haifa University for helpful discussions and comments on this work. Research reported in this publication was supported by The National Institute of Dental and Craniofacial Research of the National Institutes of Health under award numbers R37DE016937, R01DE024468, by National Institutes of Health grants R01GM121950 and Sub_P30DK036836_P and F, by the Defense Advanced Research Projects Agency (Purchase Request No. HR0011515303, Contract No. HR0011-15-C-0091), the US Department of Energy (DE-SC0012627), the Boston University Interdisciplinary Biomedical Research Office, and by the Boston University training program in quantitative biology and physiology under Ruth L Kirschstein National Research Service Award T32GM008764 from the National Institute of General Medical Sciences. The content is solely the responsibility of the authors and does not necessarily represent the official views of the granting agencies.

Senior Editor

  1. Naama Barkai, Weizmann Institute of Science, Israel

Reviewing Editor

  1. Wenying Shou, Fred Hutchinson Cancer Research Center, United States

Reviewers

  1. Wenying Shou, Fred Hutchinson Cancer Research Center, United States
  2. Christopher Quince, University of Warwick, United Kingdom

Publication history

  1. Received: July 5, 2018
  2. Accepted: June 13, 2019
  3. Accepted Manuscript published: June 13, 2019 (version 1)
  4. Version of Record published: July 4, 2019 (version 2)

Copyright

© 2019, Bernstein et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 1,416
    Page views
  • 227
    Downloads
  • 0
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)

Further reading

    1. Cancer Biology
    2. Computational and Systems Biology
    Adam C Palmer et al.
    Research Article
    1. Computational and Systems Biology
    2. Microbiology and Infectious Disease
    Guillaume Witz et al.
    Research Article