The connection between gene loss and the functional adaptation of retained proteins is still poorly understood. We apply phylogenomics and metabolic modeling to detect bacterial species that are evolving by gene loss, with the finding that Actinomycetaceae genomes from human cavities are undergoing sizable reductions, including loss of L-histidine and L-tryptophan biosynthesis. We observe that the dual-substrate phosphoribosyl isomerase A or priA gene, at which these pathways converge, appears to coevolve with the occurrence of trp and his genes. Characterization of a dozen PriA homologs shows that these enzymes adapt from bifunctionality in the largest genomes, to a monofunctional, yet not necessarily specialized, inefficient form in genomes undergoing reduction. These functional changes are accomplished via mutations, which result from relaxation of purifying selection, in residues structurally mapped after sequence and X-ray structural analyses. Our results show how gene loss can drive the evolution of substrate specificity from retained enzymes.https://doi.org/10.7554/eLife.22679.001
Genome dynamics, or the process by which an organism gains or loses genes, plays a fundamental role in bacterial evolution. Acquisition of new functions due to horizontal gene transfer (HGT) or genetic duplications is broadly documented (Wiedenbeck and Cohan, 2011; Blount et al., 2012). Gene loss has also been implicated in rapid bacterial adaptation after experimental evolution (Hottes et al., 2013), but this process has not yet been confirmed in natural populations. Phylogenomics involves the comparative analysis of the gene content of a set of phylogenetically related genomes to expose new insights into genome evolution and function, and this approach has been classically applied to study how gene gain is associated with functional divergence in bacteria (Treangen and Rocha, 2011). Here, we propose that bacterial phylogenomics can be similarly applied to study evolution by gene loss (Albalat and Cañestro, 2016), specifically where enzymes are evolving within bacterial species that are undergoing genome decay (Adams et al., 2014; Price and Wilson, 2014).
The current bias toward in-depth functional analysis of proteins from genomes that are undergoing gene gain by HGT versus gene loss by decay is likely due to two factors. First, in genomes that are undergoing decay, there is a relaxation in the selection pressure that increases mutation rates in functioning proteins as these proteins begin to contribute less to cell fitness (Wernegreen and Moran, 1999; McCutcheon and Moran, 2011). As a result, these proteins display higher-than-normal mutation rates, making in vitro analysis of protein function a challenge (Couñago et al., 2006). Second, there is only a brief window of opportunity to study the evolution of most proteins during genome decay in bacteria. This is because most proteins are monofunctional, and they are rapidly removed from the bacterial genome once they become dispensable due to gene loss. To overcome this limitation, we propose to use a bifunctional enzyme to study the evolution of substrate specificity after gene loss, as these enzymes may continue to operate when only one of their associated metabolic pathways becomes dispensable. We use genome-scale metabolic models to determine when each pathway is lost as well as when they become non-functional (Henry et al., 2010).
The phylum Actinobacteria, Gram-positive organisms with high (G+C)-content, are ubiquitous and show one of the highest levels of bacterial metabolic diversity (Barka et al., 2016). This phylum is known to display significant metabolic specialization, and phylogenomics has been previously applied to correlate genome dynamics with metabolic pathway evolution and enzyme specialization (Noda-García et al., 2013; Verduzco-Castro et al., 2016; Cruz-Morales et al., 2016). Moreover, within the deep-rooted family Actinomycetaceae, phylogenetic analyses have suggested the occurrence of genome decay (Zhao et al., 2014). Furthermore, we have observed that many actinobacterial species lack a trpF gene, while retaining a copy of the potentially bi-functional priA gene. The PriA enzyme is capable of operating in the L-histidine biosynthesis pathway as HisA, while also functioning in the L-tryptophan biosynthesis pathway as TrpF (Noda-García et al., 2013; Verduzco-Castro et al., 2016; Barona-Gómez and Hodgson, 2003). As such, we suggest PriA is an ideal candidate to study protein evolution during the process of genome decay.
The product of the priA gene, which is a hisA homolog, catalyzes two analogous isomerizations of structurally similar substrates: (i) the conversion of N-(5'-phosphoribosyl)-anthranilate (PRA) into 1-(O-carboxyphenylamino)-1'-deoxyribulose-5'-phosphate (CdRP) (TrpF or PRA isomerase activity); and (ii) the conversion of N-[(5-phosphoribosyl) formimino]-5-aminoimidazole-4-carboxamide ribonucleotide (ProFAR) into N-[(5-phosphoribulosyl)formimino]-5-aminoimidazole-4-carboxamide ribonucleotide (PRFAR) (HisA or ProFAR isomerase activity) (Barona-Gómez and Hodgson, 2003). Moreover, evolution of PriA in response to genome dynamics has lead to the appearance of the SubHisA and PriB subfamilies, which have been shown to have different substrate specificities (Noda-García et al., 2013; Verduzco-Castro et al., 2016) (Figure 1). While these enzyme subfamilies were respectively discovered using phylogenetics in the genera Corynebacterium and Streptomyces, no study has ever centered on the metabolic genes associated with priA in the deep-branching organisms belonging to the phylum Actinobacteria, such as those of the family Actinomycetaceae where the transition from HisA into PriA must have taken place (Noda-García et al., 2015; Plach et al., 2016).
Here we exploit the intrinsic features of PriA to explore the link between the evolution of enzyme function and gene loss within the family Actinomycetaceae, which includes many human oral cavity commensal and pathogenic organisms (Zhao et al., 2014; Yeung, 1999; Könönen and Wade, 2015). We classify this bacterial family into four major evolutionary lineages, including two specific for the genus Actinomyces. One of these lineages shows extensive gene loss, including his and trp biosynthetic genes. After the loss of constrictions imposed by the retention of biosynthetic pathways, we found that evolutionary patterns correlate with the sub-functionalization, yet not necessarily specialization, of PriA into two new subfamilies, which we named SubHisA2 and SubTrpF. We support this classification by comprehensive in vivo and in vitro biochemical characterization of a dozen PriA homologs from Actinomyces and closely related taxa. X-ray structural analysis and molecular docking simulations were further used to start investigating the evolution of substrate specificity by gene loss in structural grounds. Our results demonstrate that gene loss can drive functional protein divergence, and provide unprecedented insights into the evolution of enzyme substrate specificity in retained enzymes after gene loss in the bacterial genome.
To find evidence of gene loss in deep-branching organisms of the phylum Actinobacteria, specifically within genera belonging to the order Actinomycetales, we selected 133 representative organisms from 18 families with available genome sequences (Figure 2A; Figure 2—source data 1). We then aimed at resolving their taxonomic relationships using 35 single-copy proteins that are conserved among all 133 genomes analyzed (see Materials and methods and Figure 2A – Figure 2—source data 2). We concatenated these proteins to reconstruct their phylogeny, and supported the resulting tree by significant Bayesian posterior probabilities. The phylogenetic tree shows several long branches, which correspond to the families Actinomycetaceae, Micrococcaceae, Propionibacteriaceae and Coriobacteriaceae, and to the genus Tropheryma. The tree also includes a clade with the family Bifidobacteriaceae as the root of six different sister families, including the Actinomycetaceae (blue branch and grey box in Figure 2A).
As expected, all of the organisms contained in the rapidly evolving lineages trended towards smaller genomes and lower (G+C)-content (Figure 2B). The Actinomycetaceae genomes were characterized by particularly broad variances in genome size and (G+C)-content, with the variation being most apparent for organisms belonging to the genus Actinomyces. Representative organisms of this genus, e.g. A. sp. oral taxon 848 str. F0332, A. oris MG-1, A. neuii, and A. odontolyticus, are distributed throughout the Actinomycetaceae clade (blue dots in Figure 2B). Given these observations, we selected the genus Actinomyces as the ideal target for a deeper analysis of rapid evolution by gene loss.
We then carried out a phylogenomic analysis using the genome sequences of 33 organisms from the family Actinomycetaceae (Figure 3 - Figure 3—source data 1), from which 27 are classified as Actinomyces (Zhao et al., 2014; Yeung, 1999), including the model strain A. oris MG-1 sequenced in this study. The remaining sequences came from the genera Actinotignum, formerly Actinobaculum (Könönen and Wade, 2015), Trueperella, Varibaculum and Mobiluncus. As an out-group we used the genera Bifidobacterium, which included eight genome sequences. We identified a total of 205 single-copy proteins shared among all these 41 organisms, which were used for constructing a concatenated phylogenetic tree by Bayesian (Figure 3; Figure 3—source data 2), and maximum likelihood approaches (Figure 3—figure supplement 1). Based in this analysis, the family Actinomycetaceae separated into four evolutionary lineages contained in three sub-clades: Lineage I, which includes A. sp. oral taxon 848 str. F0332 (Org10); lineage II, which includes A. oris MG-1 (Org21); and lineages III and IV, which form a monophyletic group and include A. neuii (Org27) and A. odontolyticus (Org41), respectively. Remarkably, these lineages group depending on their mammalian hosts and human body niches from which they were isolated (Figure 3 - Figure 3—source data 1).
Our phylogenetic analysis also shows that 25 of the 27 Actinomyces species analyzed have a paraphyletic origin leading to lineages II and IV. These two lineages can be distinguished not only according to their genome size and (G+C)-content, but also to the number of coding sequences (CDS) and metabolic functions or subsystems (Figure 3—figure supplement 2). Specifically, as revealed by genome annotation using RAST (Aziz et al., 2008), lineage II, which has the highest (G+C)-content (68.32% on average) and the biggest genome size (3.04 Mbp on average), has the largest number of amino acid biosynthetic pathways (see next section). This observation contrasts with the results obtained for lineage IV, which shows reduced (G+C)-content (60.66% on average) and genome size (2.19 Mbp on average), as well as less amino acid biosynthetic pathways. Indeed, the genomic differences between lineages II and IV were found to be statistically significant, including the presence or absence of the his and trp biosynthetic genes (Figure 3; Figure 3—figure supplement 2—source data 1). We explore this observation in more detail by constructing metabolic models for all of the analyzed genomes in the following section.
In order to reduce the risk of overreaching conclusions based only in homology sequence searches, we constructed genome-scale metabolic models of all 33 organisms comprising this family, plus the nine outgroup Bifidobacterium species (see Materials and methods). Next, flux balance analysis was applied to predict the minimal nutrients required to support growth for each genome. Finally, after automated curation of the metabolic reconstructions (Satish Kumar et al., 2007), which includes not only homologous but also analogous enzymes, we classified each reaction in each model as: (i) essential for growth on predicted minimal media; (ii) functional but not essential; and (iii) nonfunctional. All model results, which represent the highest quality functional annotation available for metabolism, are provided as source data of Figure 4: model overview (Figure 4—source data 1), reaction content and classifications (Figure 4—source data 2) and predicted minimal media (Figure 4—source data 3).
The lineage II models were generally the largest with an average of 1019 gene-associated reactions, which is to be expected since the lineage II genomes are also the largest. These models also had the fewest predicted essential nutrients with an average of 19 nutrients required. This result indicates that most biosynthetic pathways for essential biomass precursors are complete in the lineage II models. The lineage I and IV models were substantially smaller with an average of 850 and 843 gene-associated reactions, respectively. Although similar in size, the lineage I models had more required nutrients (25 on average) compared with the lineage IV models (22 on average). Finally, the lineage III models were the smallest of all, with an average of 817 gene-associated reactions. Surprisingly, these models still had fewer required nutrients than the lineage I models (23 on average). These results provide a meaningful biochemical context in which biosynthetic enzymes are evolving.
To study the metabolic diversity of each lineage in more detail, we performed a comparative analysis of the gene-associated reactions of our models (Figure 4A–D). Given the large metabolic and genetic diversity, we used less stringent parameters than those used for our core genome analysis sustaining our phylogenomics of previous section (see Materials and methods). This comparative analysis revealed that the lineage II genomes were the least diverse, with a very large fraction of reactions present in all models, including those for amino acid biosynthesis (Figure 4—figure supplement 1). All other lineages were more diverse. Interestingly, a comparative analysis of our models found that all models across all lineages share a common conserved core of 695 reactions. When we similarly compute a conserved core for each individual lineage (Figure 4E), we find that the 89% of reactions in the conserved core for each lineage are contained in the conserved core across all lineages.
From these modeling results, we clearly see that lineages I, III and IV are all undergoing the process of gene loss, resulting in a reduction towards a common set of core metabolic pathways. This explains the rapid development of diversity within each lineage, as well as the variability in minimal required nutrients predicted by our models. We can also apply our models to study the gene loss process from a mechanistic perspective by looking for patterns in the presence and absence of genes and reactions for two specific pathways of interest: L-tryptophan and L-histidine biosynthesis. Our models predicted genomes in lineages I, III, and IV (but none from lineage II) that required these amino acids as a supplemental nutrient, indicating the loss of these biosynthetic pathways in these organisms. We also observed that the presence of the priA gene, which takes part in both L-tryptophan and L-histidine biosynthesis, closely tracked with the presence of these pathways in these genomes (Figure 5A). This observation suggests that gene losses could have an effect on the evolution of the retained PriA enzymes.
To bring down these observations at the enzyme level, we carried out comparable phylogenetic analyses of PriA (Figure 5), and we measured the evolutionary rate of its gene by estimating the dN/dS ratio (ω value) for each resulting clade (Table 1). The PriA phylogeny was complemented with an analysis of the occurrence of the his and trp biosynthetic genes, including priA for both pathways (Figure 5; Figure 5—source data 1). Excluding the out-group, our phylogenetic reconstructions show that PriAs from different lineages are grouped in three sub-clades, highlighted in purple, orange, and yellow boxes in Figure 5A, which have distinguishable selection pressures operating upon them. This analysis also shows that PriA coevolves with the presence or absence of the his and trp biosynthetic genes (Figure 5B).
The purple box denotes a paraphyletic clade that includes PriAs from lineage II, as well as the PriAs from the genus Bifidobacterium used as an out-group. The dN/dS value of this lineage, which retains the entire set of his and trp genes, is 0.0636, consistent with purifying selection. The orange and yellow boxes denote polyphyletic groups that include PriAs from lineages I, III, and IV. Interestingly, the included taxa within these lineages lost their extant his or trp genes differentially (Figure 4A), and their dN/dS values are 0.0901 and 0.1459, respectively, which is suggestive of relaxation of purifying selection. Moreover, the higher dN/dS values in the clade shown in yellow seem to be due to accumulation of nonsynonymous substitutions, in other words, higher values of dN that may relate to changes in enzyme specificity (Table 1).
Thus, on the basis of these evolutionary observations, we proposed three functional and testable hypotheses related to the emergence of novel PriA enzyme subfamilies in the bacterial family Actinomycetaceae (Figure 6A). In H1 (purple box) we assume that PriA homologs are conserved as enzymes with dual-substrate specificity, capable of converting both PRA and ProFAR substrates. In H2 (orange box) and H3 (yellow box) the PriA homologs are expected to be monofunctional isomerases, yet not necessarily specialized enzymes, capable of converting ProFAR or PRA as substrates, respectively. Moreover, given that relaxation of purifying selection is associated with the latter two hypothetical scenarios, H2 and H3, our model predicts monofunctional, yet not necessarily specialized enzymes capable of supporting growth. Representative enzymes of each hypothesis were selected for further biochemical characterization, as described.
Before evaluating the functional implications of our evolutionary hypotheses from previous section, we confirmed that the priA gene is functional in Actinomyces. For this purpose, we used allelic exchange to delete the priA gene from the chromosome of A. oris MG-1 (Org21) (Wu and Ton-That, 2010; Delisle et al., 1978), a model strain that belongs to lineage II and whose genome was sequenced as part of this study. Mutation of priA in this organism was confirmed by sequencing the entire genome of the resulting ΔpriA mutant strain (Supplementary file 1). Determination of the growth requirements of this strain, termed ΔpriA_Org21, showed that priA mutation leads to L-tryptophan auxotrophy, demonstrating the physiological relevance of PriA in this organism. Unexpectedly, however, the ΔpriA mutant remains prototrophic for L-histidine, which could not be explained on the basis of current data. Thus, it is tempting to speculate that this phenotype may find an explanation in the previously reported association between enzyme promiscuity and genome decay (Adams et al., 2014; Price and Wilson, 2014).
To biochemically evaluate the functional implications of our evolutionary hypotheses (Figure 6A), we characterized nine selected PriAs, both in vivo, by complementation assays using trpF and hisA Escherichia coli mutants; and in vitro, by estimation of their Michaelis Menten steady-state enzyme kinetic parameters, as we have previously done (Noda-García et al., 2013; Verduzco-Castro et al., 2016; Noda-García et al., 2010). The results of these experiments are included in Table 2. First, in vivo complementation assays using appropriate priA constructs, showed that PriA homologs from Org15, Org21, and Org22 (H1) were able to rescue growth of both HisA and TrpF deficient strains. Second, priA homologs from Org34 and Org36 (H2) complemented the HisA activity and, to a lesser extent, the TrpF activity. Third, those priA homologs from Org10, Org13, Org39 and Org41 (H3) were able to complement the TrpF activity but not the HisA activity.
The priA homologs were then heterologously expressed and purified to homogeneity in E. coli (see Materials and methods). Only five enzymes out of nine were found to be soluble and could be purified as needed, which agrees with the high mutation rate encountered in the previous section. Fortunately, we obtained Michaelis Menten enzyme kinetics parameters for representative enzymes of all three evolutionary hypotheses, namely, three enzymes belonging to H1 and one enzyme each for H2 and H3, with the following results (Figure 6B and Table 2). First, enzymes from Org15, Org21, and Org22 (H1) showed dual-substrate specificity but also poor catalytic efficiencies, namely, kcat/KMProFAR from 0.01 to 0.1 μM−1s−1 and kcat/KMPRA around 0.01 μM−1s−1. Second, only ProFAR isomerase activity could be detected in vitro using pure enzyme from Org36 (H2), with a catalytic efficiency of kcat/KMProFAR of 0.002 μM−1s−1, but not PRA isomerase activity, as suggested by our highly sensitive in vivo complementation assay. Third, PRA isomerase activity as the sole activity present in H3 was confirmed in the enzyme purified from Org42, with a kcat/KMPRA of 0.02 μM−1s−1.
The obtained enzyme kinetics parameters suggest that mutations that accumulate during relaxation of purifying selection, which make these enzymes difficult to work with, affect the turnover (kcat). In the case of the H1 enzymes, the poor turnovers are compensated for by relatively high substrate affinities (KM), mainly for ProFAR. However, this does not seem to be the case for the enzymes belonging to H2 and H3, which have poor KM parameters not only for the substrate of the missing activity but also for the substrates they are active against, ProFAR and PRA, respectively. Therefore, PriA homologs from Actinomyces have poor catalytic efficiencies when compared with bona fide PriAs from its closely related genus Bifidobacterium (Table 2). This suggests that enzyme evolution from bifunctionality to monofunctionality under relaxation of purifying selection does not necessarily express itself in the same way as recorded during purifying or positive selection, where specialization and enzyme proficiency come together.
The case of the in vivo PRA isomerase activity detected for the enzyme from Org36, which could not be confirmed in vitro, may be related to the different resolutions of our enzyme assays. For instance, the detection limits for the PRA and ProFAR isomerase assay used in the present study are 0.0001 μM−1s−1 and 0.001 μM−1s−1, respectively (Noda-García et al., 2013; Verduzco-Castro et al., 2016; Noda-García et al., 2010). However, despite the poor catalytic efficiency of all Actinomyces enzymes investigated, these detection limits guarantee that our enzyme parameters are in agreement between them and with our hypotheses. Based on these results the family related to H1, which has both activities, is referred to as PriA, whereas the latter two enzyme subfamilies, related to H2 and H3, were renamed as SubHisA2 and SubTrpF, respectively. These names, together with the name of the organism from which the enzymes were obtained, are used in Table 2 and in the following sections.
To potentially identify mutations in active-site residues that may affect kcat and KM parameters, we attempted to elucidate the structure of the five PriA homologs that we were able to in vitro characterize. However, we were only able to crystallize and solve the structure of PriA_Org15 (H1) at atomic resolution of 1.05 Å (PDB: 4 × 2R, Figure 7; Figure 7—source data 1). To compare this structure with SubHisA2_Org36 (H2) and SubTrpF_Org41 (H3), we opted for the construction of structural homology models. Since the ability of PriA to accept both ProFAR and PRA as substrates requires productive conformations, we also explored these interactions using molecular docking. This was complemented with detailed structure-based multiple sequence alignments taking into account all available PriA functional and structural data (Figure 7B). This combined approach allowed us to identify mutations that may be driving the evolution of PriA into SubHisA2 and SubTrpF enzyme subfamilies.
Changes of conserved residues from PriA (H1) into SubHisA2 (H2) enzymes include Ile47Leu and Ser79Thr. Previous independent mutation of these two residues, even into similar amino acids, in SubHisA from Corynebacterium abolished the PRA isomerase activity of this monofunctional enzyme (Noda-García et al., 2013). Analogously, the SubHisA2_Org36 has a change of Ser79 into Thr79 (Figure 7B). In this mutation, the methyl group of the threonine residue may affect the contact between PRA and the hydroxyl group common to these residues (Figure 7A), thus abolishing PRA isomerase activity. This effect agrees with the estimated binding affinities for ProFAR (−9.5 kcal/mol) and PRA (−9.2 kcal/mol) obtained after molecular docking (Supplementary file 2). The energy-minimized docking model of the productively bound PRA, in agreement with the kinetic parameters from the preceding section, indicates that the catalytic residue Asp11 does not interact with the 2′-hydroxyl group from the substrate. A precedent for this contact is found in previous X-ray structural and mutagenesis analysis of bona fide PriA enzymes (Noda-García et al., 2010; Due et al., 2011).
Comparison of PriA (H1) with SubTrpF (H3) revealed the mutations Gly126Cys and Trp139Gly. In PriA, Gly126 faces the active site near the catalytic residue Asp128. The introduction of the Cys side-chain in SubTrpF could influence the positioning of Arg137 with respect to Asp128, obstructing the accommodation of ProFAR, as this region interacts with a large phosphosugar moiety that is absent from PRA (Figure 7B). Furthermore, Trp139, which is catalytically important for conversion of ProFAR by PriA, is mutated into several different amino acid residues in SubTrpFs. Trp139 is important for the correct positioning of the catalytic residues present in loop 5, and for substrate binding through stacking interactions (Verduzco-Castro et al., 2016; Due et al., 2011). Indeed, the indole group of Trp139 in PriAs can form a hydrogen bond with Asp128, stabilizing the knot-like conformation observed during ProFAR binding. Thus, mutation of this residue in SubTrpF is in agreement with the loss of ProFAR isomerase activity. Arg83 is also interesting as it is differentially missing from SubTrpF, and/or the fragment preceding it contains a two-residue insertion (Figure 7B). Arg83 interacts with the second phosphate group of ProFAR, allowing its correct position in the substrate-binding pocket of PriA. Overall, these modifications in key residues disfavor the ProFAR binding affinities, a result that is in agreement with the enzyme kinetic parameters and the estimated binding affinities for ProFAR (−9.5 kcal/mol) and PRA (−9.7 kcal/mol) obtained after molecular docking (Supplementary file 2).
Although further research is needed to confirm the exact mutations and their roles leading to SubTrpF and SubHisA2 sub-families, our results provide a promising first step towards deciphering at the atomic level how relaxation of purifying selection influences the evolution of substrate specificity. At this point in time, when PriA, SubTrpF and SubHisA2 sequences and structures are still scarce, the effects of genetic drift, i.e. mutations related to taxonomic distance rather than functional divergence (as previously shown for the evolution of PriB from PriA [Verduzco-Castro et al., 2016]) cannot be ruled out. An extra factor potentially hampering sequence and structural analysis is the higher-than-normal mutation rates of these protein sub-families, which translates into lack of sequence conservation and disordered regions in X-ray crystal structures. Our structural data, including the estimates for molecular binding affinities, can therefore only be used to support other biochemical and evolutionary evidence.
Our study highlights the use of phylogenomics and metabolic models to identify and investigate gene loss in bacteria. Our results indicated that the distinctive reactions retained in each Actinomyces genome reflect the preservation of some full biosynthetic pathways over others, conferring a capacity to grow on different sets of environmental nutrients. This result in turn implies an exposure of these genomes to a diverse range of environmental conditions and selection pressures, while the phylogenetic proximity of these functionally diverse genomes speaks to a strong capacity for rapid adaptation to the diverse conditions present in the human body. The process of gene loss, associated with relaxation of purifying selection, is the key driver of this adaptation strategy. Thus, metabolic diversity in complex systems as the human microbiome might be characterized after reconstruction of evolutionary trajectories, which may reflect different bacterial functions and ecological sub-niches. The pattern of reaction conservation seen in our metabolic modeling analysis exemplifies a likely signature for gene loss, which could be used to identify these phenomena among other genome families. Remarkably, in this context, enzyme specialization does not necessarily means catalytic proficiency.
Our study of this gene loss process exposed evolutionary patterns of PriA in L-tryptophan and L-histidine biosynthesis pathways, with the potential to unveil the underpinning mechanisms driving the evolution of substrate specificity of retained enzymes. Because multifunctional enzymes may have more than one constraint operating on them, tracking functional evolution promptly after selection is relaxed during genome decay might be done more readily than with monofunctional enzymes. As shown here, only partial selection may be released in the retained bifunctional enzyme PriA. Indeed, the predicted metabolic phenotypes unveiled by flux balance analysis did correlate better with the evolutionary patterns revealed by metabolic gene occurrence and PriA phylogenetic reconstructions than they did with the natural history told by the species tree. To confirm this sort of evolutionary behavior, other instances of well-known multifunctional proteins, such as moonlighting proteins, may be investigated.
The occurrence of SubHisA2 in Actinomyces, together with the appearance of SubHisA in Corynebacterium, demonstrates that subfunctionalization of PriA leading into HisA-like enzymes has occurred at least twice. Such phenotypic plasticity is a reflection of the intrinsic enzymatic proficiency of PriA upon two related but topologically dissimilar substrates; but, more interestingly, the evolutionary histories behind these independent subfunctionalization events responded to somehow contrasting evolutionary mechanisms. Whereas SubHisA is the result of positive selection after the acquisition of an entire trp operon by HGT (Noda-García et al., 2013), SubHisA2 responded to the loss of trp genes, and it evolved under relaxation of purifying selection. Consequently, SubHisA has drastic mutations in its catalytic active site, which have been shown to be responsible for its inability to catalyze PRA, whilst SubHisA2 shows some residual PRA isomerase activity, congruent with the observation that its active-site architecture is almost completely conserved.
The subfunctionalization of PriA into SubTrpF, in contrast, has been documented only here. This functional shift had to involve ‘non-proficient’ enzyme specialization, since the ancestral activity of PriA is ProFAR isomerase (Plach et al., 2016). Thus, the appearance of SubTrpF with substitutions in its catalytic active-site could be discussed based on previous knowledge about PriA. These mutations actually resulted in the elimination of the ancestral ProFAR activity, which is remarkable because the driving force behind this process relates to the relaxation of purifying selection. In agreement, a recent study of PriA sequences obtained from a diverse metagenome, complemented by some of the SubTrpF sequences studied here, classified this enzyme subfamily at the transition from HisA into PriA (Noda-García et al., 2015). Since Actinomyces undergoes interspecies recombination with protein functional implications (Do et al., 2008), such a mechanism may provide a means to explain the sequence heterogeneity found in these Actinomyces PriA homologs.
Our study, therefore, provides experimental evidence that gene loss can drive functional protein divergence. It also shows that, despite the fact that the retained enzymes possess low catalytic activities, they contribute to the maintenance of metabolism, and therefore, to fitness. Taken together, our evolutionary observations backed with metabolic modeling, biochemical and structural data, suggest multiple selection types associated with ecological micro-niches, e.g. environmental cues provided by the human body. Thus, enzyme subfamilies are the result of processes involving different selection types upon proteins with more than one function. Although further examples showing metabolic-driven evolutionary histories need to be identified, our study provides a strategy for the in-depth use of genome sequences for protein and bacterial evolutionary studies to understand enzyme function.
The genomes of the genus Bifidobacterium and the family Actinomyceatceae were obtained from NCBI (NCBI accession numbers are provided as Figure 3—source data 1). The genomes were annotated by using RAST (Aziz et al., 2008). We identified core orthologous genes using BBHs (Tatusov et al., 1997) with a defined e-value of 0.001. The sequences were aligned with MUSCLE 3.8.31 (Edgar, 2004) and edited with GBLOCKS (Castresana, 2000). We concatenated all the orthologous groups for phylogenomic analysis. The phylogenetic analyses were carried out using MrBayes v.3.2.1 (32) and maximum likelihood analysis using RAxML v.8 (33). For MrBayes, we used a mixed model, and for the maximum likelihood analysis, we used the generalized time reversible (GTR) model. Branch support was measured as the posterior probability of clades in the consensus tree for Bayesian analysis; and with 1000 bootstrapping replicates in the maximum likelihood analysis. To calculate the nonsynonymous (dN) and synonymous (ds) substitution rates between PriA and homologous subfamilies, we aligned all the sequences by codon using RevTrans 1.4 Server (Wernersson and Pedersen, 2003). To calculate the dN/ds ratio we used codeml in the PAML four package (Yang, 2007). GC content, genome size, CDS content, and number of subsystems between the lineages were compared by using the T-test in the package R. All the boxplots were done with R.
The A. oris MG-1 strain (Delisle et al., 1978) was sequenced using an in-house Illumina MiSeq sequencing platform. We used Trimmomatic (Bolger et al., 2014) to filter the reads and Velvet v1.2.10 (Zerbino and Birney, 2008) to assemble the reads. The Whole Genome Shotgun (WGS) A. oris MG-1 project has been deposited at GenBank under the project accession [PRJNA327886].
We applied the DOE Systems-biology Knowledgebase (KBase) to construct draft genome-scale metabolic models. The model reconstruction process was optimized as previously (Satish Kumar et al., 2007), and comprised of three steps: (i) genome annotation by RAST (Aziz et al., 2008); (ii) reconstruction of a draft model using the ModelSEED approach (Henry et al., 2010); and (iii) gapfilling of the model to permit growth and plug holes in mostly complete pathways (Dreyfuss et al., 2013). In the gap-filling process, we identified the minimal set of reactions that could be added to each model to permit biomass production in a medium containing every transportable metabolite. We also favored the addition of reactions that would permit more gene-associated reactions in each model to carry flux.
Once models were built, we applied flux balance analysis (FBA) (Orth et al., 2010) to predict minimal feasible media and classify reactions using a six step process: (i) set the biomass flux to a nonzero value; (ii) minimize the number of active exchange reactions to identify the minimal set of external nutrients that must be provided to permit growth; (iii) constrain exchange fluxes so that only the minimal exchanges are allowed to function; (iv) minimize and maximize each reaction flux to classify each reaction during growth on minimal media (Mahadevan and Schilling, 2003); (v) maximize biomass flux on minimal media and fix the biomass flux at its maximum value; and (vi) minimize the sum of all fluxes in the model to produce the simplest flux profile possible (e.g. removing all flux loops). Reactions with only positive or negative fluxes are classified as essential; reactions with only zero flux values are classified as nonfunctional; and reactions with zero and non-zero flux values are classified as functional.
For construction of the overall model per lineage, we identified all reactions that were associated with genes (i.e. not gapfilled) in at least 75% of the models included in the lineage, using a permissive e-value of 0.01. These reactions formed the basis of our lineage model. Then we applied the same gapfilling algorithm used with our genome models to permit the lineage model to grow. Finally, we applied our FBA pipeline to predict minimal media and classify reactions in the lineage model. All the models, associated genomes, minimal media predictions, reaction classifications, and flux predictions generated in this study are presented using the KBase Narrative Interface and are accessible at https://narrative.kbase.us/narrative/ws.17193.obj.1. See also Figure 4—source data 1, Figure 4—source data 2 and Figure 4—source data 3.
The priA genes from Org15, Org10, and Org41 were synthesized by GeneArt (Thermo Fisher Scientific, USA). Additionally, priA genes from Org13, Org22, Org34, Org36, and Org39 were synthesized by GenScript (GenScript, USA). Codons were optimized for E. coli heterologous expression. The priA homologs from A. oris MG-1, B. longum, B. gallicum and B. adolescentis were PCR cloned from our genomic DNA collection. Oligonucleotide sequences of primers used in this study are included in Supplementary file 3. All genes were inserted into pET22b, pET28a (Novagen) for expression and protein purification, and pASK for complementation assays, by using the NdeI and HindIII restriction sites (Noda-García et al., 2015). The in vivo trpF and hisA complementation assays, and in vitro determination of the Michaelis-Menten steady-state enzyme kinetics parameters for both PRA and ProFAR as substrates, were done as previously (Noda-García et al., 2013; Verduzco-Castro et al., 2016; Noda-García et al., 2010). Lack of enzyme activity in vitro was confirmed using active-site saturation conditions, as before (Noda-García et al., 2013; Verduzco-Castro et al., 2016).
To create a ΔpriA mutation in A. oris MG1 1.5 Kbp fragments upstream and downstream of this organism were amplified by PCR (Supplementary file 3). The upstream fragment was digested with EcoRI and NdeI, the downstream fragment with NdeI and XbaI. The upstream and downstream fragments were ligated together in a single step. The fragment was cloned into pCWU3 precut with EcoRI and XbaI after digestion with appropriate enzymes. The generated plasmid was then introduced into A. oris MG-1 (Org21) by electroporation. Corresponding in-frame deletion mutants were selected by using mCherry fluorescence as a counter-selectable marker and resistance to kanamycin (Wu and Ton-That, 2010). The deletion mutant was confirmed by PCR and by sequencing of the entire genome of the resulting mutant strain.
PriA_Org15 was expressed and produced in BL21 Magic cells bearing the plasmid pMCSG68_PriA_Org15. The protein was purified by immobilized metal-affinity chromatography (IMAC) followed by His6-tag cleavage using recombinant His-tagged TEV protease. A second IMAC step was used to remove the protease, the uncut protein, and the affinity tag. Concentrated protein (37 mg ml−1) was crystallized by sitting-drop vapor-diffusion technique in 96-well CrystalQuick plates (Greiner Bio-One, USA). The crystals appeared at 289 K in conditions consisting of 0.2 M Li2SO4, 0.1 M CAPS:NaOH pH 10.5, and 1.2 M NaH2PO4/0.8 M K2HPO4. Prior to data collection crystals were cryoprotected in 2.4 M K2HPO4 and subsequently flash-cooled. Diffraction data were collected at 100 K. Native datasets were collected at 19-ID equipped with an ADSC quantum Q315r CCD detector at 0.979 Å wavelength. The images were processed by using the HKL3000 software suite (Minor et al., 2006). Molecular replacement was carried out by using the coordinates of PriA from M. tuberculosis (Due et al., 2011) used as a search probe in Phaser (McCoy et al., 2007). The initial model was then improved by the automatic rebuilding protocol in Arp/wArp, and further modified by iterations of manual rebuilding in COOT (Emsley and Cowtan, 2004) and fully anisotropic crystallographic refinement in PHENIX (Adams et al., 2010) with hydrogen atoms in riding positions. The PriA_Org15 model comprises residues Ser-2-Arg137 and Gly143-Ala247, 305 water molecules, 4 phosphate ions, and 1 CAPS moiety. The mFo-DFc difference map reveals two strong positive peaks (near Asp51 and Leu230) that could not be unambiguously assigned. The quality of the refined models was verified using the Molprobity server (Chen et al., 2010). Data collection statistics and the refinement results are provided as Figure 7—source data 1.
T-coffe package was used for all multiple sequence alignments (Notredame et al., 2000). Protein structural homology models of SubHisA_Org36 and SubTrpF_Org41 were based on the crystal structure of PriA from PriA_Org15 (PDB:4X2R; this study). A standard modeling strategy using Robetta and Rosetta 3.5 (47) was adopted. Molecular models of PRA and ProFAR were built using Molden (Schaftenaar and Noordik, 2000), and optimal atomic configuration of both substrates was obtained using Gaussian 09 (Gaussian Inc., Wallingford CT, USA) through a quantic geometry optimization using a self-consistent field at the Hartree-Fock 6–31G* level. Polar hydrogen atoms and Gasteiger-Marsili empirical atomic partial charges were added using AutoDockTools (Morris et al., 2009). An extensive configuration sampling of PRA and ProFAR binding biophysical interactions with PriA catalytic site was performed with Autodock Vina (Trott and Olson, 2010). Results were merged, refined, clustered, and energy sorted to produce a set of complex configuration predictions.
PHENIX: a comprehensive Python-based system for macromolecular structure solutionActa Crystallographica Section D Biological Crystallography 66:213–221.https://doi.org/10.1107/S0907444909052925
Taxonomy, physiology, and natural products of actinobacteriaMicrobiology and Molecular Biology Reviews 80:1–43.https://doi.org/10.1128/MMBR.00019-15
Trimmomatic: a flexible trimmer for Illumina sequence dataBioinformatics 30:2114–2120.https://doi.org/10.1093/bioinformatics/btu170
Selection of conserved blocks from multiple alignments for their use in phylogenetic analysisMolecular Biology and Evolution 17:540–552.https://doi.org/10.1093/oxfordjournals.molbev.a026334
MolProbity: all-atom structure validation for macromolecular crystallographyActa Crystallographica Section D Biological Crystallography 66:12–21.https://doi.org/10.1107/S0907444909042073
High-throughput generation, optimization and analysis of genome-scale metabolic modelsNature Biotechnology 28:977–982.https://doi.org/10.1038/nbt.1672
Actinomyces and related organisms in human infectionsClinical Microbiology Reviews 28:419–442.https://doi.org/10.1128/CMR.00100-14
HKL-3000: the integration of data reduction and structure solution--from diffraction images to an initial model in minutesActa Crystallographica Section D Biological Crystallography 62:859–866.https://doi.org/10.1107/S0907444906019949
AutoDock4 and AutoDockTools4: automated docking with selective receptor flexibilityJournal of Computational Chemistry 30:2785–2791.https://doi.org/10.1002/jcc.21256
Identification and analysis of residues contained on beta --> alpha loops of the dual-substrate (beta alpha)8 phosphoribosyl isomerase A specific for its phosphoribosyl anthranilate isomerase activityProtein Science : A Publication of the Protein Society 19:535–543.https://doi.org/10.1002/pro.331
Evolution of substrate specificity in a recipient's enzyme following horizontal gene transferMolecular Biology and Evolution 30:2024–2034.https://doi.org/10.1093/molbev/mst115
T-Coffee: a novel method for fast and accurate multiple sequence alignmentJournal of Molecular Biology 302:205–217.https://doi.org/10.1006/jmbi.2000.4042
Molden: a pre- and post-processing program for molecular and electronic structuresJournal of Computer-Aided Molecular Design 14:123–134.https://doi.org/10.1023/A:1008193805436
AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreadingJournal of Computational Chemistry 31:455–461.https://doi.org/10.1002/jcc.21334
Evidence for genetic drift in endosymbionts (Buchnera): analyses of protein-coding genesMolecular Biology and Evolution 16:83–97.https://doi.org/10.1093/oxfordjournals.molbev.a026040
RevTrans: multiple alignment of coding DNA from aligned amino acid sequencesNucleic Acids Research 31:3537–3539.https://doi.org/10.1093/nar/gkg609
Allelic exchange in Actinomyces oris with mCherry fluorescence counterselectionApplied and Environmental Microbiology 76:5987–5989.https://doi.org/10.1128/AEM.00811-10
PAML 4: phylogenetic analysis by maximum likelihoodMolecular Biology and Evolution 24:1586–1591.https://doi.org/10.1093/molbev/msm088
Phylogenomics and evolutionary dynamics of the family ActinomycetaceaeGenome Biology and Evolution 6:2625–2633.https://doi.org/10.1093/gbe/evu211
Alfonso ValenciaReviewing Editor; Barcelona Supercomputing Center - BSC, Spain
In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.
Thank you for submitting your work entitled "Evolution of substrate specificity in a retained enzyme driven by gene loss" for further consideration at eLife. Your article has been favorably evaluated by Patricia Wittkopp (Senior Editor) and three reviewers, one of whom is a member of our Board of Reviewing Editors.
There are some issues that need to be addressed before acceptance, as outlined below:
The section on biochemical data requires a more clear version since there are questions about the significance and interpretation of the values reported. The interpretation of the parallel decay of both Trp and His pathways is not sufficiently convincing in the current version of the paper.
The interpretation of the mutations in the x-ray structure and models, together with the analysis of the MSA, are considered important but weak elements of the paper. The use of better methods and a more systematic approach will be required.
A new version should justify clearly the need of building complete metabolic models for the questions addressed in the paper.
The basic proposal of the paper is interesting and well positioned. The study of PriA dual enzyme acting in two pathways (His and Trp) as a model of the changes in response to a large genomic delection of complete pathways, is an interesting and original idea.
The work presented include:
First the basic phylogenetic analysis of the protein family with the distribution of the aa pathways and the PriA gene. (The analysis of the larger family presented in the first figure seems irrelevant for the rest of the paper). The results are interesting and demonstrate the existence of a potential relation between the absence of the enzymes of the two pathways and the distribution of the PriA gene.
The importance of the pathways is analyzed with metabolic models of the different genomes. The exercise is interesting but it is debatable if it is really necessary for the conclusions obtained. ("Overall, we see from the modeling analysis that lineages I, III and IV are all undergoing the process of gene loss, with different genomes in these lineages losing different sets of metabolic genes." would not be enough with knowing that enzymes of the key His and Trp pathways are missing? What else do we learn that is important for the logic of the study?)
in vitro biochemical assays of some of the sequences of the key groups are presented to demonstrate the lack of restrictions of the enzymes of the SubHis and SubTrp groups. The results are partially consistent with the interpretation of the authors but the data suggest that the actual situation might be far more complex of what the model suggest. Indeed, it seems difficult to explain the poor catalytic efficiencies of the enzymes of the subHis and subTrp groups in the context of the adaptations to the corresponding missing pathways.
The analysis of the structure of the enzyme based on a single x-ray obtained for this paper and homology models of the other prototypic sequences. It is a correct analysis but very speculative. It provides very little solid evidence for the interpretation of the mutations.
The analysis of the variation at the sequence level based on the MSA alignmentof members of the family. In this case it is unclear how some specific positions are selected as key differences between SubTrp and SubHis. Some of the differences are not consistent with the conservations in the groups and it is unclear if they are conserved in other sequences of the family (for example, the T for S substitution in subHisA2_Org36 is not present in subHisA2_Org34).
Additionally, the analysis is not systematic and none of the various existing methods for the analysis of positions specific of subfamilies are used.
In summary, a very interesting idea and a potentially good model to explore the evolution of the function of an enzyme in similar genomes subject to specific gene losses. A great effort to combine different type of approaches from genomics to modelling, including biochemistry, phylogenetics and protein structures. To conclude: some not so optimal interpretation of the results and general lack of sufficiently convincing support for the proposed model.
This Manuscript presents an imposing body of work showing how gene loss can drive the evolution of substrate specificity from retained enzymes. The authors did a thorough work, using multiple approaches to elaborate and support their hypotheses. More in detail:
1) Genome sequencing of the A. oris MG-1 strain.
2. Phylogenomic analysis of the Actinomycetaceae family, using 35 single-copy conserved proteins in 133 organisms.
3) Phylogenomic analysis 0f 205 single-copy proteins conserved in 41 organisms (33 from the Actinomycetaceae family, and 8 from the Bifidobacterium genus, used as an outgrup).
4) Metabolic model reconstruction and flux balance analysis on 33 Actinomycetaceae.
5) Molecular evolution analysis of the PriA gene within Actinomycetaceae.
6) Biochemical analysis of PriA enzymes.
7) Crystallographic analysis of PriA.
8) Structural modelling, analysis, and molecular docking to study the evolution of SubHisA2 and SubTrpF.
Overall, I found this work outstanding, and I think this is the kind of manuscript every evolutionary biologist would like to read. I really enjoyed it. I think the authors did a great job in designing the experiments, and in my opinion the methods, the results, the hypotheses and the conclusions are presented in a very clear and accurate form. I could not find any substantive concern about this work.
This paper presents an interesting analysis of two phenomena that are under-studied: gene loss, and generalist-to-specialist evolution. In general, we know a lot on gene gain (by duplication or HGT) and relatively little on gene loss, and, everyone assumes that generalists evolving to specialists is a dominating trend in evolution, whereas it could may well be that the opposite trend, highlighted here, is equally common. Overall, the scope of the work from phylogenetic to crystal structures is impressive, and the evolutionary trend and unraveled mechanisms of gene loss are well supported.
On the critical side, I must admit that I'm not so convinced that the effects of specialization and general decay of activity can be distinguished in this case. By and large, both activities (Trp and His reactions) drop down and largely in parallel (Table 2). The detection limit for then in vitro assays is not specified (i.e., what n.d. values represent) but the residual, other activity may not be so far from the one detected and assigned as 'specialized'. The structural argumentation is weak – it's not that the authors did not make an effort to unravel the effects of mutations, but it's sometimes hard to understand what mutations are doing even within active-sites let alone in second-shell, and especially, when these enzyme variants are so crippled. In general, orthologous enzyme vary in activity, often over 3 orders, and even between relatively close species. So, I think that it's hard to say whether, and which part of, the change in activities can be attributed to gene loss and concomitant specialisation (to either Trp or His) and which part to an overall relief of selection (i.e., for both reactions). Finally, since these genome sequences are sporadic (unlike say hundreds of E. coli strains that were sequenced), some mutations may relate to drift (polymorphism) and may not represent a long-term "wild-type" sequence. Previous cases analysed by this group are far more clear-cut in this respect (the effect of specialization). These caveats should be explicitly addressed.https://doi.org/10.7554/eLife.22679.033
- Ana Lilia Juárez-Vázquez
- Ernesto A Verduzco-Castro
- Lianet Noda-García
- Sofía Medina-Ruíz
- Francisco Barona-Gómez
- Janaka N Edirisinghe
- Christopher S Henry
- Karolina Michalska
- Gyorgy Babnigg
- Michael Endres
- Andrzej Joachimiak
- Chenggang Wu
- Hung Ton-That
- Julián Santoyo-Flores
- Mauricio Carrillo-Tripp
- Mauricio Carrillo-Tripp
- Andrzej Joachimiak
- Christopher S Henry
- Francisco Barona-Gómez
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
We acknowledge Nelly Selem-Mójica, Víctor Villa-Moreno, José-Luis Steffani-Vallejo and Christian E. Martínez-Guerrero for bioinformatics support. We thank Sean Rovito and Angélica Cibrián-Jaramillo for useful comments and evolutionary discussions. We thank members of the Structural Biology Center at Argonne National Laboratory for data collection support. This work was supported by Conacyt Mexico, via grants 132376 to MCT and 179290 to FBG, as well as a scholarships to AJV. And by The National Institutes of Health, grant GM094585 to AJ, the US Department of Energy, under contract DE-AC02-06CH11357, and the National Science Foundation, grant 1611952 to CSH, and the National Institute of Dental and Craniofacial Research of the National Institutes of Health, grant DE017382, to HTT.
- Alfonso Valencia, Reviewing Editor, Barcelona Supercomputing Center - BSC, Spain
This is an open-access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication.