Research Article

Prediction of enzymatic pathways by integrative pathway mapping

University of California, San Francisco, United States
University of Illinois, United States
Sanford Burnham Prebys Medical Discovery Institute, United States
Russian Academy of Sciences, Russia
Albert Einstein College of Medicine, United States

Jan 29, 2018

Open access
Copyright information

Abstract
Introduction
Results
Discussion
Materials and methods
References
Article and author information
Metrics

Abstract

The functions of most proteins are yet to be determined. The function of an enzyme is often defined by its interacting partners, including its substrate and product, and its role in larger metabolic networks. Here, we describe a computational method that predicts the functions of orphan enzymes by organizing them into a linear metabolic pathway. Given candidate enzyme and metabolite pathway members, this aim is achieved by finding those pathways that satisfy structural and network restraints implied by varied input information, including that from virtual screening, chemoinformatics, genomic context analysis, and ligand -binding experiments. We demonstrate this integrative pathway mapping method by predicting the L-gulonate catabolic pathway in Haemophilus influenzae Rd KW20. The prediction was subsequently validated experimentally by enzymology, crystallography, and metabolomics. Integrative pathway mapping by satisfaction of structural and network restraints is extensible to molecular networks in general and thus formally bridges the gap between structural biology and systems biology.

https://doi.org/10.7554/eLife.31097.001

Introduction

Problem and approach

The functions of most sequenced proteins have not been determined by experiment (Gerlt et al., 2011; Jacobson et al., 2014; Schnoes et al., 2009). They are also difficult to predict for enzymes with less than 60% sequence identity to characterized enzymes (Radivojac et al., 2013). The problem is much greater when seeking to predict the functions of entire metabolic pathways. Here, we propose a computational approach that outputs an enzymatic pathway and corresponding ligands, given a set of potential enzymes and metabolites, using information derived by experiment and/or computational analyses (Figure 1). The approach benefits from two considerations. First, predicting an entire pathway may sometimes be easier than predicting individual enzymatic functions in isolation, because the product of one enzyme is the substrate for the next in the pathway. Thus, even when each enzyme’s ligand assignment is ambiguous, the ligand assignments consistent with both enzymes may be more precise and accurate. Second, while it may be impossible to identify a pathway using information from any single method, there may be sufficient information provided by several methods. Our approach was inspired by the previous work using metabolite docking to multiple enzymes and substrate-binding proteins hypothesized to participate in the pathway (Jacobson et al., 2014; Kalyanaraman and Jacobson, 2010; Macchiarulo et al., 2004; Zhao et al., 2013), integrative structure determination of large macromolecular assemblies (Alber et al., 2007; Russel et al., 2012), and comparative genomics approaches for metabolic reconstructions (Markowitz et al., 2010; Karp et al., 2016; Osterman and Overbeek, 2003; Overbeek et al., 2005; Yamanishi et al., 2007; Ye et al., 2005).

Figure 1 with 3 supplements see all

Download asset Open asset

Overview of integrative pathway mapping method.

The four stages of integrative modeling are: (1) Gathering information, (2) Designing model representation and evaluation, (3) Sampling good models, and (4) Analyzing models and information. (1) Here, the input information is gathered from seven different sources used to determine the candidate proteins, such as co-localization and conservation in the genome neighborhood, and the scoring restraints (docking scores from virtual screening, chemical transformations, ensemble similarity calculations of virtual screening hits from similarity ensemble approach, DSF screening hits, metabolic endpoints, and characterized chemical reactions). (2) A pathway model is represented as a graph composed of protein and ligand nodes. Proteins are depicted as diamonds and ligands are depicted as circles, with lines showing the node patterns evaluated by a given type of information. (3) The combinatorial optimization problem is solved by Monte Carlo simulated annealing sampling, consisting of randomly swapping nodes in and out of the pathway model and rearranging the edges between the nodes. (4) The final analysis stage involves assessing the sampling, precision, and accuracy of the models.

https://doi.org/10.7554/eLife.31097.002

A pathway model is represented as a graph in which the enzymes, substrates, and products are nodes and the enzymatic reactions are edges (Figure 1). Input information, such as scores from experimental ligand screening, molecular docking screens, and chemical similarity, is encoded as ‘network’ restraints on the identity of the nodes and edges in the pathway; these restraints are combined into a scoring function. An ensemble of pathways consistent with the input information is computed by a Monte Carlo algorithm that samples well-scoring pathways over possible enzymes and metabolites. The resulting ensemble of good-scoring pathway models is assessed by its precision, its satisfaction of the input restraints, and ideally experimental observations not used in its construction. In addition to gauging the accuracy and precision of the models and the observations, this analysis can identify the most informative future experiments. Because the approach ranks alternative pathways using all available information, it in principle produces maximally accurate, precise, and complete pathway models given that information. The process of data gathering and modeling can iterate until a satisfactory model is obtained. We suggest that the four stages of integrative pathway mapping by satisfaction of network restraints mimic how human experts often derive and test pathway models.

Results

The approach begins with a list of candidate proteins (here enzymes, binding proteins, and transporters) and a list of endogenous metabolites that are candidate substrates or products of these enzymes (Figure 1, Figure 1—figure supplement 1). The pathway members can be winnowed from the entire proteome by predicting functionally related proteins using information about the genome organization that is often available for bacterial pathways (Zhao et al., 2014). For example, for the gulonate pathway, we identified five metabolic enzymes that are conserved in the genome neighborhood of the TRAP transporter gene by constructing a genome neighborhood network (GNN) (Figure 1—figure supplement 2); the GNN approach has been demonstrated to accurately predict enzymes and transporters that function together in metabolic pathways based on conserved protein families in genome neighborhoods across different species (Zhao et al., 2014). The network restraints can then be inferred in multiple ways, exemplified by the following restraints in this study (Figure 2A). First, for each candidate, the libraries of endogenous metabolites are docked against either an experimentally determined structure if available or a comparative structure model (Mysinger and Shoichet, 2010). In the case of glycolysis, 2965 sugars in the KEGG database were screened against two crystal structures and eight comparative models for the 10 enzymes in this pathway (Kalyanaraman and Jacobson, 2010). Second, with the top 500 metabolites docked-and-ranked against each of the enzymes, the pathway enzymes may be linked by the similarity of their high-ranking docked ligands, here using the chemoinformatic Similarity Ensemble Approach (SEA) (Keiser et al., 2007; Lin et al., 2013); other related approaches can also be used (Besnard et al., 2012; Gregori-Puigjané and Mestres, 2006; Mestres et al., 2006; Nidhi et al., 2006; Paolini et al., 2006). The restraint can be informative because enzymes are often more likely to be pathway neighbors when their high-scoring docked ligands resemble each other. For instance, the top 500 metabolites of 3-phosphate dehydrogenase in the glycolysis pathway (as ranked by docking) are dominated by six chemotypes, while the phosphoglycerate kinase has three of these chemotypes overrepresented. This similarity is captured by the SEA E-value of 9.5 10⁻⁶³, suggesting that the observed level of similarity between the two predicted ligand lists is unlikely to have occurred by chance (Figure 2—figure supplement 1C). Thus, the two enzymes are linked by their related predicted metabolites. Third, consideration of the enzymatic reaction types assigned to the enzymes’ superfamilies restrains the reactions in the pathway models. We require that the predicted metabolites can actually be substrates or products of an enzyme, given its reaction profile extracted from its protein family annotation. As an example, the glyceraldehyde 3-phosphate dehydrogenase is assigned the reactions that can convert an aldehyde to a phosphate and vice versa (Supplementary file 6). Finally, all available experimental screening hits, substrate specificities from homology, constraints on the pathway endpoints, and other information can also be considered. Each of these considerations is added to a scoring function that ranks alternative pathway models by assessing their consistency with the available information (Figure 2B). Thus, pathway models are preferred when they contain (i) good-scoring metabolite-enzyme pairs, (ii) pairs of neighboring enzymes that share chemotypes, (iii) pairs of neighboring enzymes catalyzing chemical reactions that allow the product of an upstream enzyme to be a substrate of the downstream enzyme, etc. This integrative approach does not require that each type of restraint be available for each protein and metabolite, nor that each restraint is accurate or precise; it only requires that the scoring function consisting of all restraints is sufficiently accurate and precise.

Figure 2 with 1 supplement see all

Download asset Open asset

Representation of alternative models obtained based on consistency with input information provided for the glycolysis benchmark pathway.

(A) Example of three alternative models evaluated using different types of restraints based on modeling of the glycolysis pathway with a subset of pathways shown. The restraints on node patterns are shown using colored lines (blue – docking restraints, green – SEA restraints, purple – chemical transformation restraints, red – restraints with unfavorable scores). Metabolites are labeled by KEGG ID and enzymes are labeled by step in glycolysis pathway. On the left, alternate model one is consistent with docking scores, but not with all SEA scores and chemical transformations. In the middle, alternate model two is consistent with the docking scores and SEA scores, but not with chemical transformations. On the right, alternate model three is consistent with docking scores, SEA scores, and chemical transformations, thus increasing the rank of the correct enzyme-substrate pairings. (B) Alternative models shown with chemical structures. (C), Ranks of correct substrate for the corresponding enzyme at each step in the glycolysis benchmark case. 1 – glucokinase, 2 – phosphoglucose isomerase, 3 – phosphofructokinase, 4 – fructose bisphosphate aldolase, 5 – triosephosphate isomerase, 6 – glyceraldehyde 3-phosphate dehydrogenase, 7 – phosphoglycerate 8 – phosphoglycerate mutase 9 – enolase and 10 – pyruvate kinase.

https://doi.org/10.7554/eLife.31097.006

Benchmarking

The method was tested by retrospectively ordering enzymes and identifying their substrates in three well characterized pathways: glycolysis (10 enzymes) (Kalyanaraman and Jacobson, 2010), cytidine monophosphate 3-deoxy-D-manno-octulosonate 8-phosphate (CMP KDO-8P) biosynthesis (four enzymes), and serine biosynthesis (five enzymes) (Supplementary files 1 and 2). Docking screens of several thousand metabolites against comparative models of the enzymes, chemical transformation annotations of the enzymes, and the chemoinformatic SEA analysis were the input information for mapping each pathway. Because the functions of these enzymes have been characterized, homology-based annotations were not included as restraints for the purposes of the retrospective benchmark. The method successfully identified the substrates and products and correctly ordered all pathway components in the top-scoring models (Figure 2; Supplementary files 1 and 2). The method performed well even when the number and identity of pathway enzymes were unspecified (Figure 3AB) or when the candidate enzymes set was incomplete (Figure 2—figure supplement 1D).

Figure 3 with 2 supplements see all

Download asset Open asset

12 representative predictions of the L-gulonate TRAP-SBP catabolic pathway.

(A) 12 representative pathway models of TRAP SBP pathway predictions ordered by score, starting from the top with the best-scored prediction. The scores of the representative pathways are listed to the right of the corresponding pathway. Pathway enzymes are labeled by numbers as follows: 1 – HiGulD, 2 – HiUxuB, 3 – HiUxuA, 4 – HiKdgK, 5 – HiKdgA. (B) Graphical representation of an ensemble of representative pathway models. The predicted components in the ensemble of pathway models at each position are vertically aligned to the corresponding position in the gray pathway on the top. Ligand components are shown as circle nodes with the color corresponding to the ligand identity. Chemical structures are shown in Figure 3—figure supplement 2. Pathway enzymes are shown as diamond nodes with the same numbering as above. Edges are colored by individual pathway model prediction. The validated prediction is shown by black edges, enzyme nodes are colored black, and substrate/product nodes are outlined in black.

https://doi.org/10.7554/eLife.31097.008

The accuracy of the predicted benchmark pathways is not limited by the lack of sampling (Figure 3—figure supplement 1), but rather by the input information (Figure 2A). Thus, the integration of multiple types of information improves the accuracy and precision of pathway prediction (Figure 2BC). For example, the correct substrate of the dehydrogenase in the glycolysis pathway, glyceraldehyde 3-phosphate (G3P), is predicted to be most consistent with all the input information. Although by docking alone, the rank of G3P is only 117 out of the 2965 metabolites docked, the additional restraints from SEA and chemical transformations lead to the overall top ranking of G3P (Figure 2A).

Several caveats bear mentioning. Both thorough sampling and accurate scoring become more difficult when the number of possible pathways increases (which in turn arises from a large set of candidate enzymes and metabolites), when some enzymes or metabolites are not in the input set, when the pathway is long, or its length is unknown. Here, only linear pathways are sampled; thus, non-linear pathways, including cyclic pathways, are not modeled. The preparation of input information requires manual processing. Although docking, chemoinformatics, comparative modeling, chemical transformations, and differential scanning fluorimetry (DSF) screening information may be collected in an automated way, the quality of information often benefits from expert choices. For example, comparative model building can be especially time consuming when low sequence similarity structures are available for target building, and docking may require expert intervention when parameterization of cofactors is necessary for correctly defining the binding site. Nevertheless, we emphasize that once the input information is provided, its conversion into the predicted pathway is automated and does not require human intervention. Finally, docking against modeled structures will sometimes fail, even with the added advantages of insisting on consistency in docking hit lists. Some of these pitfalls can be detected through testing the thoroughness of sampling (Figure 3—figure supplement 1), statistical bootstrapping and jack-knifing tests (Efron, 1981), and by direct experimental testing of predictions (Figure 4). The method becomes more robust when the pathway start and end are defined. More generally, failures can also be reduced by introducing restraints or constraints that limit the size of the input enzyme and metabolite sets, by improving the accuracy of the scoring function, by limiting the sampling, or by further filtering the set of good-scoring solutions.

Figure 4 with 2 supplements see all

Download asset Open asset

Catabolic pathway of *H. influenzae* Rd KW20.

(A) The best-scoring pathway identified using the integrative mapping approach is annotated with experimental evidence: enzyme activity (blue), fitness growth determinants (red), transcript analyses on L-gulonate media (orange), atomic structure (green), and isotopic metabolic labeling (purple). The pathway demonstrates L-gulonate degradation into glyceraldehyde 3-phosphate and pyruvate. Bonds undergoing changes in the subsequent steps are colored in red. (B) Kinetics of pathway enzymes on predicted substrates. (C) Crystal structure of L-gulonate bound to SBP TRAP (PDB ID: 4PBQ). (D) Knockout growth assays of *H. influenzae* strains, ΔGulP (gulonate transporter periplasmic subunit) and ΔGulD (L-gulonate dehydrogenase), when grown on D-glucose vs. L-gulonate as a sole carbon source. (E) Fold change in expression for each gene when grown on the indicated carbon source, relative to growth on glucose. Error bars indicate one standard deviation for three biological replicates.

https://doi.org/10.7554/eLife.31097.011

Prospective prediction of the L-gulonate catabolic pathway in Haemophilus influenzae Rd KW20

L-gulonate and D-mannonate were identified as potential ligands of the TRAP solute binding protein (SBP) from H. influenzae (Supplementary file 3), using DSF screening of a library of 189 compounds (Vetting et al., 2015). The HiGulPQM TRAP transporter consists of three subunits, including the periplasmic SBP HiGulP, and two membrane components HiGulQ and HiGulM. SBPs recognize substrates to be imported into the cytosol by the transporter. Because these sugars are not involved in central carbon metabolism, the observation suggested an uncharacterized pathway that converts L-gulonate or D-mannonate into substrates for central carbon metabolism. While this pathway had been proposed based on the DSF screening hits and genome neighborhood analysis, we sought to predict it using integrative pathway mapping, based on the following information (Figure 1, Figure 1—figure supplement 1).

First, the position of TRAP SBP was fixed at the pathway start and its ligand was constrained to be L-gulonate or D-mannonate; this positioning is reasonable given the TRAP’s role as a transporter. Second, five possible pathway enzymes (a dehydratase, two dehydrogenases, a kinase, and an aldolase) were identified from the genome neighborhood around the TRAP-solute-binding protein (Uniprot ID P71336) (Figure 1—figure supplement 2A–C, Supplementary file 2). Third, 3650 out of the 14,212 metabolites in KEGG (Kanehisa et al., 2016) were identified as the smallest unique ligand set of substrates or products for these enzymes, based on the top scoring docking hits, which were optimized by chemical transformations and chemical similarity (Supplementary file 2). Fourth, the pathway was constrained to end in a metabolite of central metabolism (Supplementary file 4). Fifth, the dehydratase (Uniprot ID P44488) was hypothesized to be a D-mannonate dehydratase because of high sequence similarity to a characterized D-mannonate dehydratase, UxuA, in other organisms (73% sequence identity to UxuA in E. coli) (Dreyer, 1987). Finally, as in the benchmarking, the scoring function used docking, SEA E-values, and chemical transformations inferred from annotations in Pfam (Finn et al., 2016).

The sampling algorithm found 154 unique high-scoring pathways, which clustered into 12 groups (Figure 3AB). The best-scoring pathway, starting from L-gulonate as the ligand of the TRAP SBP, begins with its oxidation to D-fructuronate by the first dehydrogenase (HiGulD) as the first catalytic step (Figure 4A). Next, reduction of D-fructuronate by the second dehydrogenase (HiUxuB) produces D-mannonate; its dehydration by the dehydratase (HiUxuA) produces 2-keto-3-deoxy-D-gluconate. The last few steps in the pathway model are part of a conserved Entner-Dourdoroff pathway, ending with glyceraldehyde 3-phosphate and pyruvate, known members of central metabolism. Even when individual restraints are excluded from the input information, the best-scoring pathway falls within the top-scoring pathway models. For example, the same pathway has the highest score if either the starting or end point is not restrained. If information about both starting and end points is excluded, this pathway model drops to 15th in score.

Experimental testing of the L-gulonate catabolic pathway

The pathway model was tested experimentally in five independent ways, including by enzyme activity, X-ray crystallography, fitness growth assays of the deletion mutants, transcript analyses, and isotopic metabolic labeling (Figure 4A–E).

First, all enzymes had k_cat/K_M values larger than 10³ M⁻¹s⁻¹ for their predicted substrates (Figure 4B). Initially, HiGulD had negligible activity with its putative substrate L-gulonate. However, the pathway prediction was deemed to be of sufficient quality to encourage optimization of enzyme purification, ultimately producing an active enzyme with a k_cat/K_M value of 10⁴ M⁻¹s⁻¹ for L-gulonate. All enzymes exhibit micromolar K_M values, except for HiUxuA (potentially reflecting high D-mannonate concentrations in the cytosol). Second, the model is supported by the crystallographic structure of the complex between the TRAP SBP protein and L-gulonate (Vetting et al., 2015) (Figure 4C). Third, knockouts (KOs) of ∆HiGulP SBP and ∆HiGulD dehydrogenase were constructed. ∆HiGulP and ∆HiGulD KO strains retain the ability to grow on glucose (Figure 4D, left), while they do not grow on L-gulonate (Figure 4D, right). Fourth, all predicted pathway encoding genes, including HiGulPQM transporter, HiGulD, HiUxuA, HiUxuB, HiKdgK, and HiKdgA, are upregulated when H. influenzae is grown on L-gulonate or D-mannonate as the sole carbon source (Figure 4E). Fifth, when H. influenzae was incubated with U-¹³C-L-gulonate during the early exponential phase, even for one minute, substantial labeling of central carbon metabolites was observed, indicating rapid cellular uptake and metabolism of L-gulonate (Figure 4—figure supplement 1A). In addition, time-dependent labeling of D-fructuronate was observed (Figure 4—figure supplement 1B), further supporting the first predicted step in the L-gulonate catabolic pathway (Figure 4—figure supplement 1C). Finally, identification of this pathway in H. influenzae allowed us to reconstruct the L-gulonate and hexuronate pathways in related bacteria, mapping their conservation and variation to better understand the evolution and function of the pathway (Figure 4—figure supplement 2).

Discussion

A number of methods have been developed to predict metabolic enzymes and pathways (Jacobson et al., 2014; Planes and Beasley, 2008). The most common method assigns enzyme function from its sequence similarity to a characterized enzyme (Lee et al., 2007), sometimes allowing genome-scale metabolic reconstructions (Karp et al., 2016; Bordbar et al., 2014). However, similarity-based approaches often fail when the sequence identity drops below 60% (Radivojac et al., 2013) or when distant homologs are functionally divergent. Virtual screening used for integrative pathway mapping, even against comparative models, can predict substrates more accurately than homology-based transfer (Fan et al., 2009). In other cases, functional linking based on omics data, such as gene clusters, phylogenetic profiles, and gene expression profiles, can also guide functional prediction (Osterman and Overbeek, 2003; Overbeek et al., 1999). Methods that integrate sequence similarity and functional linking can improve predictions (Plata et al., 2012), as can approaches that incorporate modeling of metabolic flux with genomics-based metabolic reconstruction to identify missing enzymes (Karp et al., 2016; Bordbar et al., 2014; Monk et al., 2014). Several studies have combined structural information with metabolic reconstructions for genome-scale analysis (Zhang et al., 2009; Brunk et al., 2016; Chang et al., 2010). Still, most of the methods are limited by the biochemical knowledge available and the reactions that are mapped. Studies that deorphanize enzyme function (Irwin et al., 2005; Korczynska et al., 2014) or annotate new pathways (Zhao et al., 2013) will enhance the accuracy and applicability of these computational methods (Bordbar et al., 2014) as well as our integrative method. However, a key strength of our integrative approach is its ability to predict pathways that contain previously unknown biochemical reactions, and to assemble pathways de novo from simple and often newly predicted enzyme activities.

Integrative pathway mapping provides a flexible and general approach to functional annotation and pathway modeling. Because it generalizes functional annotation into the sampling of pathways consistent with any available input information, it can use more information than alternative methods and thus, at least in principle, produces more accurate, precise, and complete answers. For example, while there are numerous methods for predicting functions by combining information (Yamanishi et al., 2007; Ye et al., 2005; Plata et al., 2012; Green and Karp, 2007; Kharchenko et al., 2006; Smith et al., 2012; Zhu et al., 2012), the generality and flexibility of integrative pathway mapping allows us to combine structural information with other types of data in a most straightforward manner. If not all bacterial pathways have enough information from the genome context to infer the pathway enzymes, many do. Moreover, no single type of input information is essential, provided sufficient information is available from other sources. For example, potential pathway members in prokaryotes could also be obtained from regulon analysis based on predicting conserved binding sites for transcriptional regulators (Ravcheev et al., 2013; Rodionova et al., 2013). Other approaches for identifying candidate pathway members are especially needed for eukaryotes, because the relationship between genome neighborhood and pathway membership is significantly weaker in eukaryotes than in prokaryotes. For some pathways in eukaryotes, consideration of homology and biochemical function as well as direct experimental evidence, such as spatial co-localization by proteomics or chemical cross-linking, could be used to identify a set of potential protein members for pathway mapping. Thus, the integrative approach is at least in principle not limited to prokaryotic pathways. If docking struggles to prioritize the right substrates as top ranking hits, it often ranks the right ones well (Korczynska et al., 2014; Hall et al., 2010; Hermann et al., 2007); insisting that the product of one step feed into the next provides a surprisingly useful criterion not only for pathway membership and ordering, but also for re-prioritizing the correct substrate from the docking candidates. The integrative approach strengthens what would ordinarily be approximate answers by insisting on maximal possible consistency across the enzymes and across different types of information. Because of the generality of integrative pathway mapping, new sources of information can be incorporated, including knockout screens and known metabolic capabilities. While not all types of input for integrative pathway mapping can be obtained automatically (e.g. docking, experimental measurements), the mapping itself is entirely automated. With further development, the framework may be applicable on a larger scale, approaching complete genomes, but mapping topologies of networks will be more demanding as it will require more input information and larger computation.

Materials and methods

Key resources table

Reagent type (species) or resource	Designation	Source or reference	Identifiers	Additional information
Gene (Haemophilus influenzae)	UxuA	This paper, pNYCOMPSC-tagless HiUxuA vector	Uniprot:P44488	See Supplementary file 7. cloned into the C-terminal TEV cleavable 10x-Histag containing vector pNYCOMPS-LIC-TH10- ccdB (C-term) such that the tag is out of frame
Gene (H. influenzae)	GulD	This paper, pNYCOMPSC-tagless HiGulD vector	Uniprot:Q57517	See Supplementary file 7. cloned into the C-terminal TEV cleavable 10x-Histag containing vector pNYCOMPS-LIC-TH10- ccdB (C-term) such that the tag is out of frame
Gene (H. influenzae)	KdgK	This paper, HiKdgK-pSGC-His vector	Uniprot:P44482	See Supplementary file 7. cloned into the N-terminal TEV cleavable 6x-Histag containing vector pNIC28-Bsa4
Gene (H. influenzae)	UxuB	This paper, HiUxuB-pSGC-His vector	Uniprot:P44481	See Supplementary file 7. cloned into the N-terminal TEV cleavable 6x-Histag containing vector pNIC28-Bsa4
Gene (H. influenzae)	KdgA	This paper, HiKdgA-pSGC-His vector	Uniprot:P44480	See Supplementary file 7. cloned into the N-terminal TEV cleavable 6x-Histag containing vector pNIC28-Bsa4
Gene (H. influenzae)	GulP	This paper	Uniprot:P71336	See Supplementary files 8 and 9
Gene (H. influenzae)	GulQ	This paper	Uniprot:P44484	See Supplementary files 8 and 9
Gene (H. influenzae)	GulM	This paper	Uniprot:P44483	See Supplementary files 8 and 9
Gene (H. influenzae)	UxuR	This paper	Uniprot:P44487	See Supplementary files 8 and 9
Oligonucleotide (H. influenzae)	UxuA, UxuR, GulD, GulP, GulQ, GulM, KdgK, UxuB, KdgA, Hflu	This paper		See Supplementary file 8. qRT-PCR oligonucleotide sequences used for gene expression profiling
Strain, strain background (H. influenzae Rd KW20)	H. flu	https://www.atcc.org	ATCC 51907	Supplementary file 9. Genetic deletion mutants of the putative L-gulonate catabolism pathway in H. influenzae Rd KW20
Genetic reagent (H. influenzae)	∆GulP	This paper		Supplementary file 8. Genetic deletion mutants of the putative L-gulonate catabolism pathway in H. influenzae Rd KW20
Genetic reagent (H. influenzae)	∆GulD	This paper		Supplementary file 8. Genetic deletion mutants of the putative L-gulonate catabolism pathway in H. influenzae Rd KW20
Transfected construct (E. coli BL21 (DE3)	BL21 (DE3) E. coli containing the pRIL plasmid	Stratagene		Growth media contain 25 μg/mL Kanamycin or 100 μg/mL Carbomycin and 34 μg/mL Chloramphenicol
Commercial assay or kit	RNAprotect Bacteria Reagent	Qiagen	Cat No./ID: 76506
Commercial assay or kit	RNeasy Mini Kit	Qiagen	Cat No./ID: 74104
Commercial assay or kit	ProtoScript First Strand cDNA Synthesis Kit	New England BioLabs	Cat No./ID: E6300S
Chemical compound, drug	2-keto-3-deoxy-D- gluconate	Enzymatically synthesized	CAS: 17510-99-5	Enzymatic synthesis by D-mannonate dehydratase (Uniprot ID B0T0B1). Verified via 1H-NMR
Chemical compound, drug	2-keto-3-deoxy-D- gluconate-6- phosphate	Enzymatically synthesized	CAS: 884312-23-6	Enzymatic synthesis by D-mannonate dehydratase (Uniprot ID B0T0B1) and 1 μM 2-keto-3-deoxy-D-gluconate kinase (Uniprot ID A4XF21). Verified via 1H-NMR
Software, algorithm	Integrative Pathway Mapping	This paper	https://github.com/salilab/pathway_mapping	The source code for the IMP program, benchmark, input scripts files, and output files for the benchmark and the gulonate pathway calculations are available here(50)
Software, algorithm	IMP program	Russel D, et al, Putting the pieces together: integrative structure determination of macromolecular assemblies. PLoS Biology. 10(1):e1001244, 2012	http://integrativemodeling.org	Integrative modeling
Software, algorithm	MODELLER	B. Webb, A. Sali. Comparative Protein Structure Modeling Using Modeller. Current Protocols in Bioinformatics, John Wiley & Sons, Inc., 5.6.1- 5.6.32, 2014.	https://salilab.org/modeller/	Comparative modeling
Software, algorithm	DOCK3.6	Mysinger MM, Shoichet BK. Rapid context-dependent ligand desolvation in molecular docking. J Chem Inf Model. 50(9):1561-73, 2010.	http://dock.compbio.ucsf.edu/	Docking
Software, algorithm	Automated version DOCK3.6	Irwin JJ, et al. Automated Docking Screens: A Feasibility Study. J. Med. Chem. 52(18)5712–5720, 2009.	http://blaster.docking.org/	Docking
Software, algorithm	Similarity Ensemble Approach (SEA)	Keiser MJ, et al. Relating protein pharmacology by ligand chemistry. Nat Biotechnol. 25(2): 197-206, 2007.	http://sea.bkslab.org/	SEA chemo-informatic calculations
Software, algorithm	OpenEye Scientific Software	OpenEye Scientific Software I. OEChem. 2.0.2 ed2014.	https://www.eyesopen.com/	In silico chemical transformations
Software, algorithm	RDKit	Landrum G. RDKit: Open-source cheminformatics. Release_2016.03.1 ed2016	http://www.rdkit.org/	Chemical similarity calculations
Software, algorithm	EFI-EST	Gerlt JA, et al. Enzyme Function Initiative-Enzyme Similarity Tool (EFI-EST): A web tool for generating protein sequence similarity networks. Biochim. Biophys. Acta. 1854(8):1019- 1037, 2015.	http://efi.igb.illinois.edu/efi-est/index.php	Genome neighborhood networks
Software, algorithm	Pythoscape v1.0	Barber AE, Babbitt PC. Pythoscape: a framework for generation of large protein similarity networks. Bioinformatics. 28(21):2845- 2846, 2012.	http://www.rbvi.ucsf.edu/trac/Pythoscape	Sequence similarity networks
Software, algorithm	Cytoscape v3.4	Shannon P, et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13(11):2498-504, 2003.	http://www.cytoscape.org/	Network visualization

Share this article

Cite this article

Overview of integrative pathway mapping method.

Representation of alternative models obtained based on consistency with input information provided for the glycolysis benchmark pathway.

12 representative predictions of the L-gulonate TRAP-SBP catabolic pathway.

Catabolic pathway of H. influenzae Rd KW20.

Author details

Sara Calhoun

Contribution

Contributed equally with

Competing interests

Magdalena Korczynska

Contribution

Contributed equally with

Competing interests

Daniel J Wichelecki

Contribution

Competing interests

Brian San Francisco

Contribution

Competing interests

Suwen Zhao

Contribution

Competing interests

Dmitry A Rodionov

Contribution

Competing interests

Matthew W Vetting

Contribution

Competing interests

Nawar F Al-Obaidi

Contribution

Competing interests

Henry Lin

Contribution

Competing interests

Matthew J O'Meara

Contribution

Competing interests

David A Scott

Contribution

Competing interests

John H Morris

Contribution

Competing interests

Daniel Russel

Contribution

Competing interests

Steven C Almo

Contribution

Competing interests

Andrei L Osterman

Contribution

Competing interests

John A Gerlt

Contribution

For correspondence

Competing interests

Matthew P Jacobson

Contribution

For correspondence

Competing interests

Brian K Shoichet

Contribution

For correspondence

Competing interests

Andrej Sali

Contribution

For correspondence

Competing interests

Citations by DOI

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

Categories and tags