Introduction

The rhizosphere serves as a hotspot for a diversity of interactions spanning from the secretion of organic compounds by plant roots to their uptake by the adjacent soil microbial community1,2. These interactions form a complex network of metabolic exchanges whose structure and function has a considerable impact on plant health3. Targeted secretion of exudates from plant roots’ into the environment is fundamental to the recruitment of specific microbes and the assembly of a plant-selected community (i.e., the rhizobiome)46. Each plant has a unique profile of exudates guiding the formation of a specialized rhizobiome that is adapted to support its mineral absorption79, secret plant growth supporting compounds1012, and provide protection against soil-borne microorganisms that are detrimental to its health1315. A comprehensive understanding of the dynamics within the rhizosphere, considering both plant-microbe and microbe-microbe interactions, can guide the targeted assembly and maintenance of plant-beneficial soil microbial systems1619. The development of such microbiome-based, plant-beneficial strategies presents ecologically sound alternatives to conventional, chemical-based solutions in supporting plant health and productivity.20,21.

‘Omics data in general and specifically metagenomics data analyses can potentially provide keys for unraveling the black box of plant-microbe and microbe-microbe interactions in complex ecosystems such as the soil2225. Sequence-based analyses are, however, typically limited in terms of functional interpretation of community dynamics26. Constraint based modeling (CBM) is an approach that allows for the simulation of bacterial-metabolic activity in a given environment based on the constraints imposed by the annotated microbial genomes22,27. This approach has long been used for studying the physiology and growth of single cells, represented as Genome Scale Metabolic Models (GSMM), under varying conditions28,29. Applying CBM over a GSMM can be used to assess the uptake and secretion of metabolites in the environment under study27. Accordingly, when CBM is applied over multiple GSMMs, metabolite exchange profiles (secretion into and consumption from the environment) become interconnected. This sheds light on conditions supporting the growth of different bacterial groups within the community as well as their functional potential in the trophic network formed16,30.

Advancement of sequencing technologies alongside the development of automatic pipelines for GSMM construction31,32 has promoted an increased use of CBM for the modeling of communities with growing complexity2224,26,33,34. The relevance of CBM to the study of microbiomes has been demonstrated in a variety of ecosystems and recent works have shown that CBM-based predictions can guide the development of strategies for microbiome management3437. An accurate representation of microbial metabolic networks depends on the origin of the genomes analyzed. To date, most studies attempting to model the microbiome of specific ecosystems by GSMM represent native species using corresponding genome sequences from public depositories, a process which is usually referred to as 16S-based genome imputation26,36,38. Genome recovery, or genome-resolved metagenomes, often referred to as Metagenome Assembled Genomes (MAGs) allows one to obtain full genomes directly from metagenomes39,40. Constructing GSMMs based on MAGs derived from a specific biological sample26 or directly from a native community33, is fundamentally superior to using 16S-based genome imputation, for this approach enables a genuine view of the metabolic activities carried out in situ. Such an in silico representation of a native community (with respect to its environment) can be used to decrypt the myriad interconnected uptake and secretion exchange fluxes transpiring within the root-associated microbiome, spanning from root exudates to altered organic forms.

The current study describes the recovery of 395 unique MAGs from metagenomes constructed for the native rhizosphere community of apple rootstocks cultivated in orchard soil affected by apple replant disease (ARD)41,42. Soils were amended with Brassicaceae seed meal, or were not amended, supporting the development of either disease-suppressive or disease-conducive root microbiomes, respectively41,42. MAGs were recovered from metagenomics data collected from these apple rhizosphere microbiomes. GSMMs constructed for the MAGs provide an in-silico representation of highly abundant species in the native rhizosphere community. CBM simulations were then conducted in a rhizosphere-like environment, where microbial uptake-secretion fluxes were connected to form a directional trophic network. The aims of this study were twofold: first, to provide a general framework for delineating inter-species interactions occurring in the rhizosphere environment (Fig. 1) and second, to characterize the metabolic roles specific groups of bacteria fulfill in seed meal-amended (disease-suppressive) vs non-amended (disease-conducive) apple rhizobiome communities.

A framework for interpreting and characterizing the network of metabolic interactions of root associated bacteria.

Going from obscure rhizosphere (upper circle) to the elucidated rhizosphere (lower circle): Microbial genomic DNA extracted from rhizosphere soil serves the construction of a rhizosphere-derived metagenome and the recovery of rhizosphere community Metagenome Assembled Genomes (MAGs). For each MAG, a Genome Scale Metabolic Model (GSMM) is built, representing a specific species in the microbial community. Apple-root exudate profiles based on metabolomics data are used for the construction of a simulation environment. Via Constraint Based Modelling (CBM), applied on GSMMs, interactions are characterized in the simulated root environment, yielding predictive information regarding potential trophic exchanges within the native rhizosphere microbial community.

Apple root exudates adopted from the works of Leisso et al. 48,49, included in the rhizosphere environment.

Results & Discussion

1. Assembly of a collection of MAGs representing a native microbial community from a soil agroecosystem

Metagenomic sequencing obtained from the rhizosphere of apple rootstocks grown in orchard soil with a documented history of replant disease resulted in a total of approximately 2 billion quality reads (after filtration) at a length of 150 bp, as described in Berihu et al42. Independent metagenomics assemblies of six different treatments yielded 1.4–2 million contigs longer than 2 kbp42. Here, assemblies were binned using MetaWRAP40 into 296-433 high quality MAGs for each of the six treatments (Supp. Data 1; completion and contamination thresholds were set to 90/5, respectively43). De-replication (the process of combining highly similar genomes into a single representative genome) requiring 99% average nucleotide identity (ANI) of the overall 2233 MAGs yielded a collection of 395 unique MAGs (Fig. 2). Across samples, 30-36% of the raw reads were mapped to the MAG collection (Supp. Table 1), in comparison to 61–71% mapped to the non-binned contigs42. Using GTDB-Tk44, taxonomic annotations were assigned at the phylum, order, genus and species level for 395, 394, 237 and three of the de-replicated MAGs, respectively, reflecting the genuine diversity of the rhizosphere community, which include many uncharacterized species45. Estimates of completion and contamination, total bin length and taxonomic affiliation for the MAG collections derived from each of the six assemblies, as well as the de-replicated MAGs, are provided in Supp. Data 1.

Phylogenomic surveys of the genomic collection assembled from sequence data derived from apple rhizosphere samples.

A. A phylogenomic tree constructed from 395 de-replicated MAGs extracted from the metagenomes. The tree is based on concatenated marker proteins according to GTDB-Tk44. B. Relative abundance of bacterial taxa at the phylum level as inferred from different phylogenomic classification approaches applied for the same samples: Binned-SG (MAGs derived from shotgun metagenomics); Non-binned SG (contigs derived from shotgun metagenomics); 16S rRNA amplicon derived from the same samples. Taxonomic classifications of rRNA amplicon data and non-binned-SG sequences are taken from Somera et al.41 and Berihu et al.42, respectively. Shotgun metagenomics data from Berihu et. al was used here for construction of MAGs catalogue. “Treatments 1 – 6” relate to six specific combinations of rootstock and soil amendment as detailed in Supp. Table 1. Relative abundance was calculated as the average value in five replicates conducted for each treatment.

As in previous reports1,46, Proteobacteria, Acidobacteria, Actinobacteria and Bacteroidetes were identified as the dominant phyla in the apple rhizosphere (Fig. 2A). Overall, the taxonomic distribution of the MAG collection corresponded with the profile reported for the same samples using alternative taxonomic classification approaches such as 16S rRNA amplicon sequencing and gene-based taxonomic annotations of the non-binned shotgun contigs (Fig. 2B). At the genus level, MAGs were classified into 143 genera in comparison to approximately 3000 genera that were identified for the same data based on gene-centric approaches42 and approximately 1000 genera based on amplicon sequencing41 of the same data.

The functional capabilities of the bacterial genomes in the apple rhizosphere were initially assessed based on KEGG functional annotations of their gene catalogue (Supp. Data 2, Supp. Fig. 1). The ten most frequent functional categories across the MAG collection were involved in primary metabolism, e.g., carbohydrate metabolism, and the biosynthesis of essential cellular building blocks such as amino acids and vitamins. Specialized functional categories included those associated with autotrophic nutrition such as carbon fixation and metabolism of nitrogen, sulfur and methane. Functional diversity was found to exist also when considering ubiquitous functions. For instance, though all bacteria are in need of the full set of amino acids, most genomes lack the full set of relevant biosynthesis pathways. The prevalence of biosynthetic pathways across MAGs (requiring at least a single relevant enzyme) ranges from 99.5% (e.g. glycine; missing in only two genomes) to 21% (e.g. tyrosine; detected in only 83 MAGs). Notably, the diversity of metabolic pathway completeness regarding amino acids and other essential cellular components suggests that the majority of bacterial soil species rely on an external supply of at least some of their obligatory nutritional demands.

2. Construction of a simulation system for exploring environment-dependent metabolic performances and growth of rhizosphere bacteria

Categorical classifications of discrete gene-entities, such as pathway completeness analyses, have several inherent limitations as an approach for the contextualization and functional interpretation of genomic information. First, categorical classifications may underestimate the completeness of robust pathways with multiple redundant routes resulting in low pathway completeness. Second, pathway completeness analyses do not take into consideration the directionality and continuity of a biosynthetic process whose full conductance requires the availability of a specific environmental resource together with the required successive series of genes/reactions. Although functional potential can be inferred based on the static set of genes and enzymes present, actual metabolic performances of bacterial species in soil are dynamic and reflect multiple factors, including the availability of different nutritional sources. Such sources can be environmental inputs like root exudates or downstream exchange metabolites secreted by cohabiting bacteria.

To better understand environment-dependent metabolic activity occurring in a native rhizosphere, a set of 395 GSMMs was constructed for the entire collection of the rhizosphere-bacterial community MAGs. All models were systematically subjected to validity and quality tests using MEMOTE47, leaving a total of 243 GSMMs whose stoichiometric consistency was confirmed (Supp. Data 3). On average, GSMMs included 1924 reactions, from which 203 were exchange reactions (specific reactions carrying out extracellular import and secretion of metabolites) and 1312 metabolites (Supp. Data 4). Altogether, the GSMM set held 5152 unique metabolic reactions, 597 exchange reactions, and 2671 different compounds. The distribution of key model attributes (reactions, metabolites and exchanges) across phyla is shown in Supp. Fig. 2. Model features were scalable with those reported by Basile et al26 and Heinken et al34. Additionally, comparison of GSMM scales indicated that the metabolic coverage (i.e., the number of reactions, which denote the potential of executing a metabolic function) of our data is within the same order of magnitude as described in recent large-scale automatic reconstructions26,34.

Next, species-specific rich (optimal) and poor (suboptimal) media were defined for each model in order to broadly assess GSMM growth capacity. A rich environment was defined as a medium containing all metabolites for which the model encodes an exchange reaction. A poor environment was defined as the minimal set of compounds enabling growth (see Methods). Essentially, poor environments contained species-specific carbon, nitrogen and phosphorous sources together with other trace elements. In addition to the two automatic media, we aimed to design a realistic simulating environment to explore the impact of the root exudates on the native community. To this end, metabolomics data from a set of studies which characterized the root exudates of Geneva 935 (G935) or Malling 26 (M26) rootstock cultivars48,49 - related to the G210 and M26 rootstocks whose rhizobiome was characterized here, were used to specify an array of apple root-derived compounds (Table 1). The list of secreted compounds was consistent with other reported profiles of plant root-exudates1,50.

The growth of each of the 243 GSMMs was simulated in each of the three species-specific environments (rich, poor, poor + exudates; exudates were added to corresponding poor media to ensure the exudates have a feasible effect). As expected, growth performances were higher on the poor medium supplemented with exudates in comparison to growth on the poor medium alone but lower than the growth in rich medium (Supp. Fig. 3). Notably, GSMM growth patterns were not phylogenetically conserved and were inconsistent between related taxa. Moreover, ranking of models’ growth rate were inconsistent in the three different media (i.e., some models’ growth rates were markedly affected by the simulated media whereas others did not; Supp. Fig. 3). This inconsistency indicates that the effect of exudates on community members is selective and differs between species (i.e., exudates increase the growth rates of some species more than others), as was previously reported1,5,51. The number of active exchange fluxes in each medium corresponds with the respective growth performances displaying noticably higher number of potentially active fluxes in the rich enviroenment (also when applying loopless FVA) (Supp. Fig. 4). Overall, Simulations confirmed the existence of a feasible solution space for all the 243 models as well as their capacity to predict growth in the respective environemnt (Supp. Data 5).

As a next step towards conducting simulations in a genuine natural-like environment, we aimed to define a single “rhizosphere environment” in which growth simulations for all models would take place. Unlike the species-specific root media (poor medium + exudates) which support growth of all models by artificially including multiple carbon sources that are derived from the automatic specifications of the poor medium, including such that are not provided by the root, this simulation environment was based on the root exudates (Table. 1) as the sole carbon sources. By avoiding the inclusion of non-exudate organic metabolites, the true-to-source rhizosphere environment was designed to reveal the hierarchical directionality of the trophic exchanges in soil, as rich media often mask various trophic interactions taking place in native communities52. Additionally, the rhizosphere environment also included an array of inorganic compounds used by the 243 GSMMs, which includes trace metals, ferric, phosphoric, and sulfuric compounds. Overall, the rhizosphere environment was composed of 60 inorganic compounds together with the 33 root exudates (Supp. Table 2). The rhizosphere environment supported the growth of only a subset of the GSMMs that were capable of using plant exudates (Supp. Fig. 3).

3. Simulating growth succession and hierarchical trophic exchanges in the rhizosphere community

To reflect the indirect effect of the root on the native community, (i.e., to capture the effect of root-supported bacteria on the growth of further community members), we constructed the Microbial Community Succession Module (MCSM), a CBM-based algorithm aimed at predicting community-level trophic successions. Unlike tools designed for modelling microbial interactions53,54, MCSM bypasses the need for defining a community objective function as the growth of each species is simulated individually. Trophic interactions are then inferred by the extent to which compounds secreted by bacteria could support the growth of other community members. The module iteratively grows the GSMMs in a defined environment, sums up their individual secretion profiles, and updates the initial simulation environment with those secreted compounds (Fig. 3A). Applying this algorithm to a microbiome in its native environment allows delineating the potential metabolic dependencies and interactions between bacterial species in a native community.

Microbial Community Succession Module (MCSM), and characterization of trophic dynamics in the community along iterations.

A. An illustration of the iterative microbial community growth module, representing the growth of community members along iterations starting in the “Rhizosphere environment” and updated with microbial secretion outputs. B. Growth rate characterization of GSMMs in the community along iterations. Each column in the heat-map represents a different GSMM (only models which have grown in the rhizosphere environment are presented); growth rate is indicated by the color bar. Iterations are represented by rows. Blank spaces indicate models not growing at that specific iteration. N is the total number of species that grew after each iteration. C. Distribution of GSMMs growing along iterations at the phylum level. D. Distribution of organic compounds secreted along iterations, classified into biochemical groups. Root exudates bar (far left) represents the classification of organic compounds in the initial “Rhizosphere environment”. Numbers on top of bars in both C, and D (designated by N) denotes the number of new entries in a specific iteration, with respect to the previous iteration.

Application of MCSM over the “rhizosphere environment” (i.e., 1st iteration, Root exudates and “inorganic compounds”, Supp. Table 2) supported the growth of 27 GSMMs (Supp. Fig. 3, Fig. 3B). Then, the initial environment was updated with 145 additional compounds predicted to be secreted by the growing community members. The 2nd iteration supported the growth of 33 additional species whose growth was supported by compounds predicted to be secreted by bacteria that grew in the 1st iteration. Following the 2nd iteration, 25 new secreted compounds were added to the rhizosphere environment. The 3rd iteration supported the growth of 11 additional GSMMs, with one additional compound secreted. After the 3rd iteration, the updated environment did not support the growth of any new species. Overall, iterative growth simulations resulted in the successive growth of 71 species (Fig. 3B; iterations 1-3).

To enlarge the array of growing species, we tested the effect of the addition of organic phosphorous sources. Organic phosphorous is typically a limiting factor in soil55 and its utilization varies greatly between microbial species (i.e., different P sources were shown to have a selective effect on different microbial groups56). The initial rhizosphere environment contained only inorganic phosphorous. During 1st -3rd MCSM simulations, nine organic P compounds were secreted to the simulation medium, which was updated accordingly. At the beginning of the 4th iteration, 31 additional organic P compounds were identified by screening the species-specific poor medium and were added to the medium (Supp. Table 2; organic Phosphorous compounds). The additional organic P compounds supported the growth of nine additional GSMMs (Fig. 3B) and led to the secretion of 13 new compounds, which were added to the environment. The 5th iteration supported the growth of four additional GSMMs and two new compounds were secreted. Final simulations in the cumulative rhizosphere environment were composed of all secreted compounds and led to the same secretion and growth profile as the previous iteration. Therefore, no further growth iterations were conducted.

Overall, the successive iterations connected 84 out of 243 native members of the apple rhizosphere GSMM community via trophic exchanges. The inability of the remaining bacteria to grow, despite being part of the native root microbiome, possibly reflects the selectiveness of the root environment, which fully supports the nutritional demands of only part of the soil species, whereas specific compounds that might be essential to other species are less abundant45. It is important to note that the specific exudate profile used here represent a snapshot of the root metabolome as root secretion-profiles are highly dynamic, reflecting both environmental and plant developmental conditions. A possible complementary explanation to the observed selective growth might be the partiality of our simulation platform, which examined only plant-bacteria and bacteria-bacteria interactions while ignoring other critical components of the rhizosphere system such as fungi, archaea, protists and mesofauna, as well as less abundant bacterial species, components all known to metabolically interact57. Finally, the MAG collection, while relatively substantial, represents only part of the microbial community. Accordingly, the iterative growth simulations represent a subset of the overall hierarchical-trophic exchanges in the root environment, necessarily reflecting the partiality of the dataset.

In terms of the phylogenetic distribution of the models, 27 bacterial species grew on the 1st iteration (in which root exudates served as the sole organic sources). These bacteria represented 14 of the 17 phyla included in the initial model collection (consisting of 243 GSMMs) and maintained a distribution frequency similar to the original community. As in the full GSMM data set (Community bar, Fig. 3C), most of the species which grew in the 1st iteration belonged to the phyla Acidobacteriota, Proteobacteria, and Bacteroidota. This result concurred with findings from the work of Zhalnina et al, which reported that bacteria assigned to these phyla are the primary beneficiaries of root exudates1. Species from three out of the 17 phyla that did not grow in the first iteration - Elusimicrobiota, Chlamydiota, and Fibrobacterota, did grow on the 2nd iteration (Fig. 3C). Members of these phyla are known for their specialized metabolic dependencies. Such is the case for example with members of the Elusimicrobiota phylum, which include mostly uncultured species whose nutritional preferences are likely to be selective58.

At the order level, bacteria classified as Sphingomonadales (class Alphaproteobacteria), a group known to include typical inhabitants of the root environment59, grew in the initial Root environment. In comparison, other root-inhabiting groups including the orders Rhizobiales and Burkholderiales59, did not grow in the first iteration. Rhizobiales and Burkholderiales did, however, grow in the second and third iterations, respectively, indicating that in the simulations, the growth of these groups was dependent on exchange metabolites secreted by other community members (Supp. Fig. 5).

Overall, 158 organic compounds were secreted throughout the MCSM simulation (from which 12 compounds overlapped with the original exudate medium). These compounds varied in their distribution and were mapped into 12 biochemical categories (Fig. 3D). Whereas plant secretions are a source of various organic compounds, microbial secretions provide a source of multiple vitamins and co-factors not secreted by the plant. Microbial-secreted compounds included siderophores (staphyloferrin, salmochelin, pyoverdine, and enterochelin), vitamins (pyridoxine, pantothenate, and thiamin), and coenzymes (coenzyme A, flavin adenine dinucleotide, and flavin mononucleotide) – all known to be exchange compounds in microbial communities60,61. In addition, microbial secretions included 11 amino acids (arginine, lysine, threonine, alanine, serine, phenylalanine, tyrosine, leucine, glutamate, isoleucine, and methionine), also known as a common exchange currency in microbial communities62. Some microbial-secreted compounds, such as phenols and alkaloids, were reported to be produced by plants as secondary metabolites63,64. Additional information regarding mean uptake and secretion degrees of compounds classified to biochemical groups is found in Supp. Fig. 6.

Conceptually, the rhizosphere microbiota can be classified into two trophic groups: primary exudate consumers, comprising microbial species that are direct beneficiaries from the root exudates, and secondary consumers, comprising microbial species whose growth may be provided directly via the uptake of metabolites secreted by other members of the soil microbial community. In the iterative MCSM simulations, compounds secreted by some of the primary consumers largely sustained the growth of secondary consumers, which were not able to grow otherwise. The full information on the secretion profiles and models’ growths is provided in Supp. Data 6.

To validate the ability of MCSM to capture trophic dependencies and succession, we further tested whether it can track the well-documented example of cellulose degradation - a multi-step process conducted by several bacterial strains that go through the conversion of cellulose and its oligosaccharide derivatives into ethanol, acetate and glucose, which are all eventually oxidized to CO265. Here, the simulation followed the trophic interactions in an environment provided with cellulose oligosaccharides (4 and 6 glucose units) on the 1st iteration (Supp. Table 3). The formed trophic successions detected along iterations captured the reported multi-step process (Supp. Fig. 7).

4. Associating trophic exchanges with soil health

The MAG collection analyzed in this study was constructed from shotgun libraries associated with apple rootstocks cultivated in orchard soil with a documented history of apple replant disease (ARD) and healthy/recovered (seed meal-amended) soils, providing a model system for disease-conducive vs disease-suppressive rhizosphere communities41. Briefly, rhizobiome communities obtained from apple rootstocks grown in replant orchard soil leading to symptomatic growth (non-amended samples) were termed ‘sick’, whereas samples in which disease symptoms were ameliorated following an established soil amendment treatment66, were termed ‘healthy‘. Both sick and healthy plants were characterized by distinct differences in the structure and function of their rhizosphere microbial communities in the respective soil samples15,41,42,66. In order to correlate microbial metabolic interactions with soil performance, GSMMs were classified into one of three functional categories based on differential abundance (DA) patterns of their respective MAGs: predominantly associated with “healthy” soil (H), predominantly associated with “sick” soil (S) and none-associated (NA) (Supp. Data 7).

The functionally classified GSMMs (H, S, NA) were consolidated into a community network of metabolic interactions by linking their potential uptake and secretion exchange profiles (as predicted along growth iterations in Fig. 3). The network was built as a directed bipartite graph, in which the 84 feasible GSMM nodes and the 203 metabolite nodes (27 root exudates, 146 microbial-secreted compounds, and 30 additional organic-P compounds) were connected by 9773 directed edges, representing the metabolic exchanges of organic compounds in the native apple rhizosphere community (Fig. 4A). Further information regarding node degrees is found in Supp. Data 8.

Trophic interactions based on community exchange fluxes predicted along iterations in the simulated rhizosphere environment.

A. Network representation of potential metabolite exchanges between rhizosphere community members. Edges in the network are directional; arrowhead from a grey node (metabolite) pointing towards a colorful node (GSMM) indicates uptake; arrowhead from a colorful node (GSMM) pointing towards a grey node (metabolite) indicates secretion. Node colors correspond to differential abundance classification of GSMMs in the different plots; H, S, NA are Healthy, Sick, Not-Associated, respectively. Metabolites found in the center of the network are of a higher connectivity degree (i.e., are involved in more exchanges). Only organic compounds are included in the network. Grey rectangles illustrate a zoom-in to specific 5 and 3 partite sub-networks (C, D). B. Pie chart distribution of GSMMs classified according to differential abundance of reads mapped to the respective MAGs; N is the total number of GSMMs in the network. C, D. Examples of specific sub-network motifs derived from the community network; PM (a three component sub-network motif, upper); PMM (a five component sub-network motif, lower), respectively. The full list of PM and PMM sub-network motifs is found in Supplementary data 9.

The directionality of the network enabled its untangling into sub-network motifs stemming from a root exudate to exchange interactions, and ending with an unconsumed end-metabolite. Two types of sub-networks were detected (Fig. 4C): 3-component (PM; Plant-Microbe) plant exudate – microbe – microbial secreted metabolite; and 5-component (PMM; Plant-Microbe-Microbe) plant exudate – microbe – intermediate microbial secreted metabolite – microbe – microbial secreted metabolite. Overall, the network included 45,972 unique PM paths and 571,605 unique PMM paths. Participation of GSMMs in PM paths ranged from 272 to 896 occurrences (Supp. Fig. 8A). GSSM participation in PMM paths ranged from 398 to 50,628 in the first microbe position (primary exudate consumer) and 1,388 to 19,738 occurrences in the second microbe position (secondary consumer). Frequency of GSMMs in the first position in PMM sub-network motifs was negatively correlated with the frequency of presence in second positions, possibly indicating species-specific preferences for a specific position/trophic level in the defined environment (Pearson=-0.279; p-value=0.009, Supp. Fig. 8B, C, D).

In order to explore the trophic preferences of bacteria associated with the different rhizosphere soil systems, the frequency of healthy (H), sick (S) or non-associated (NA) GSMMs in the PM and PMM sub-networks was compared (Supp. Data 9). GSMMs classified as S initiated a significantly higher number of PMM sub-networks (located in the first position) than GSMMs classified as NA and H (Fig. 5A). H classified PMM paths (first position) initiated a significantly higher number of sub-networks with GSMMs classified as NA compared to S-classified GSMMs, but no more than H-classified GSMMs (second position). Other PMM types did not show a significant effect at the second position. The higher number of trophic interactions formed by the S-classified primary exudate consumers in the PMM sub-network motifs suggests that non-beneficial bacteria may have a broader spectrum in terms of their utilization potential of root-secreted carbon sources compared to plant-beneficial bacteria. This might shed light on the dynamics of ARD, in which S-classified bacteria become increasingly dominant following long-term utilization of apple-root exudates, resulting in diminished capacity of the rhizosphere microbiome to suppress soil-borne pathogens66,67.

Characterization of PM and PMM sub-network motifs’ features associated with differentially abundant (DA) GSMMs.

A. Count distribution of PMM sub-networks determined by GSMMs in the first microbial position classified as either healthy (H), sick (S), or non-associated (NA). Boxes extend from first quantile to the third quantile, middle line represents the median; dots outside whiskers indicate outliers. Distinction of groups was determined using ANOVA followed by the Tukey post-hoc test. Asterisks indicate significance of test (*** <= 0.005 ** <= 0.01 * <= 0.05) B. Bar plot indicating the number of exudates significantly associated with H or S-classified PM sub-networks (Hypergeometric test; FDR <= 0.05; green: healthy-H, red: sick-S). C. Bar plots indicate the number of secreted compounds in PM sub-networks, which are significantly associated with H-classified (upper, colored green), or S-classified (lower, colored red) (Hypergeometric test; FDR <= 0.05). Compound name identifiers were taken from the BIGG database (X-axes). Venn diagram on the right represents the intersection of secreted compounds derived from both S and H classified PM sub-networks.

In order to predict exchanges with potential to support/suppress dysbiosis, the frequency of DA GSMM types (i.e., H or S) associated with metabolites (either consumed or secreted) in the PM paths was assessed (Supp. Data 10). Considering consumed metabolites (root exudates), three and six compounds were found to be significantly more prevalent in H and S-classified PM paths, respectively (Fig. 5B). Notably, the S-classified root exudates included compounds reported to support dysbiosis and ARD progression. For example, the S-classified compounds gallic acid and caffeic acid (3,4-dihidroxy-trans-cinnamate) are phenylpropanoids – phenylalanine intermediate phenolic compounds secreted from plant roots following exposure to replant pathogens68. Though secretion of these compounds is considered a defense response, it is hypothesized that high levels of phenolic compounds can have autotoxic effects, potentially exacerbating ARD. Additionally, it was shown that genes associated with the production of caffeic acid were upregulated in ARD-infected apple roots, relative to those grown in γ-irradiated ARD soil69,70, and that root and soil extracts from replant-diseased trees inhibited apple seedling growth and resulted in increased seedling root production of caffeic acid71.

As to the microbial-secreted compounds, a total of 79 unique compounds were found to be significantly overrepresented in either S (42 compounds) or H (41 compounds) classified PM paths (Fig. 5C). Several secreted compounds classified as healthy exchanges (H) were reported to be potentially associated with beneficial functions. For instance, the compounds L-Sorbose (EX_srb L_e) and Phenylacetaladehyde (EX_pacald_e), both over-represented in H paths (Fig. 5C), have been shown to inhibit the growth of fungal pathogens associated with replant disease72,73. Phenylacetaladehyde has also been reported to have nematicidal qualities74. Combining both exudate uptake data and metabolite secretion data, the full H-classified PM path 4-Hydroxybenzoate; GSMM_091; catechol (Fig. 4C; the consumed exudate, the GSMM, and the secreted compound, respectively) provides an exemplary model for how the proposed framework can be used to guide the design of strategies which support specific, advantageous exchanges within the rhizobiome. The root exudate 4-Hydroxybenzoate is metabolized by GSMM_091 (class Verrucomicrobiae, order Pedosphaerales) to catechol. Catechol is a precursor of a number of catecholamines, a group of compounds which was recently shown to increase apple tolerance to ARD symptoms when added to orchard42,75. This analysis (PM; Fig 4C), leads to formulating the testable prediction that 4-Hydroxybenzoate can serve as a selective enhancer of catecholamine synthesizing bacteria associated with reduced ARD symptoms, and therefore serve as a potential source for indigenously produced beneficial compounds.

Conclusions

In this study, we present a framework combining metagenomics analyses with CBM, which can be used to gain a deeper understanding of the functionality, dynamics and division of labor among rhizosphere bacteria, and link their environment-dependent metabolism to biological significance. This exploratory framework aims to illuminate the black box of interactions occurring in the rhizospheres of crop plants and is based on the work of Berihu et al. 2022, in which a gene-centric analysis of metagenomics data from apple rhizospheres was conducted42.

We recovered high-quality sets of environment-specific MAGs, constructed the corresponding GSMMs, and simulated community-level metabolic interactions. By including authentic apple root exudates in the models, we were able to begin untangling the highly complex plant-bacterial and bacterial-bacterial interactions occurring in the rhizosphere environment. More specifically, we used the framework to investigate a microbial community via examining its hierarchical secretion-uptake exchanges along multiple iterations (Fig. 3). These analyses, which linked community-derived secretion profiles with the growth of other community members, demonstrated the successive, trophic-dependent nature of microbial communities. These interactions were elucidated via construction of a community-exchange network (Fig. 4). Possible connections between root exudates, differentially abundant bacteria, their secreted end-products, and soil health were explored using the data derived from this network. From these analyses, we were able to associate different metabolic functionalities with diseased or healthy systems, and formulate new hypotheses regarding the general function of differentially abundant bacteria in the community.

The framework we present is currently conceptual. Dealing with a highly complex system such as the rhizobiome inevitably comes with limitations. These limitations include the usage of automatic GSMM reconstruction, inherent caveats of CBM and the use of single-species GSMMs, the lack of transcriptomic and spatial-chemo-physical data, and the exclusion of competition over all its forms. Furthermore, a portion of the metabolomics data used in this framework was taken from a different source (different rootstock genotype), possibly introducing further bias to the analyses. This potential factor is due to the inherent discrepancy between the conditions from which genomics and metabolomics data were collected1,4. Also not considered in this framework is the role of eukaryotes in the microbial-metabolic interplay. Moreover, the use of an automatic GSMM reconstruction tool (CarveMe31), though increasingly used for depicting phenotypic landscapes, is generally less accurate than manual curation of metabolic models32. This approach typically neglects specialized functions involving secondary metabolism16 and introduces additional biases such as the overestimation of auxotrophies76,77. Nevertheless, manual curation is practically non-realistic for hundreds of MAGs, an expected outcome considering the volume of sequencing projects nowadays. As the primary motivation of this framework is the development of a tool capable of transforming high-throughput, low-cost genomic information into testable predictions, the use of automatic metabolic network reconstruction tools was favored, despite their inherent limitations, in pursuit of addressing the necessity of pipelines systematically analyzing metagenomics data.

For these reasons, among others, the framework presented here is not intended to be used as a stand-alone tool for determining microbial function. The framework presented is designed to be used as a platform to generate educated hypotheses regarding bacterial function in a specific environment in conjunction with actual carbon substrates available in the particular ecosystem under study. The hypotheses generated provide a starting point for experimental testing required to gain actual, targeted and feasible applicable insights38,42. While recognizing its limitations, this framework is in fact highly versatile and can be used for the characterization of a variety of microbial communities and environments. Given a set of MAGs derived from a specific environment and environmental metabolomics data, this computational framework provides a generic simulation platform for a wide and diverse range of future applications.

In the current study, the root environment was represented by a single pool of resources (metabolites). As genuine root environments are highly dynamic and responsive to stimuli, a single environment can represent at best a temporary snapshot of the conditions. Conductance of simulations with several sets of resource pools (e.g., representing temporal variations in exudation profile) can add insights on their effect on trophic interactions and community dynamics. In parallel, confirming predictions made in various environments will support an iterative process that will strengthen the predictive power of the framework and improve its accuracy as a tool for generating testable hypotheses. Similarly, complementing the genomics-based approaches done here with additional layers of ’omics information (mainly transcriptomics & metabolomics) can further constrain the solution space, deflate the number of potential metabolic routes and yield more accurate predictions of GSMMs’ performances33.

To summarize, we have constructed a framework enabling the analysis of interactions between microbes and between microbes and their hosts in their natural environment. Where recent studies begin to apply GSMM reconstruction and analysis starting from MAGs33,78, this work applies the MAGs to GSMMs approach to conduct large-scale CBM analysis over high-quality MAGs derived from a native rhizosphere and explore the complex network of interactions in light of the functioning of the respective agro-ecosystem.

The application of this framework to the apple rhizobiome yielded a wealth of preliminary knowledge about the metabolic interactions occurring within it, including novel information on putative functions performed by bacteria in healthy vs. replant-diseased soil systems, and potential metabolic routes to control these functions. Overall, this framework aims to advance efforts seeking to unravel the intricate world of microbial interactions in complex environments including the plant rhizosphere. The framework is provided as a three stage-detailed pipeline in https://github.com/FreilichLab/Trophic_interactions_predicting_framework.

Methods

Recovery of Metagenomics Assembled Genomes MAGs from metagenomics data constructed for apple rhizosphere microbiomes

High coverage shotgun metagenomics sequence data were obtained from microbial DNA extracted from the rhizosphere of apple rootstocks cultivated in soil from a replant diseased orchard42. The experimental design included sampling of six different soil/apple rootstock treatments with five biological replicates each, as described in Somera et al. 202141. Two different apple rootstocks (M26, ARD susceptible; G210, ARD tolerant) were grown in three different treatments: 1) orchard soil amended with Brassica napus seed meal, 2) orchard soil amended with Brassica juncea/Sinapis alba seed meal (BjSa) and 3) no-treatment control soil (NTC) (see Supp. Table 1). Microbial DNA was extracted from rhizosphere soil and metagenomics data were assembled as described in42. In each assembly contigs were binned to recover MAGs using MetaWRAP pipeline (v1.3.1), which utilizes several independent binners40. The MAGs recovered by the different binners were collectively processed with the Bin_refinement module of metaWRAP, producing an output of a refined bin collection. A count table was constructed by mapping raw reads data of each assembly (e.g., BjSa; G210) to the bins, using BWA-MEM mapping software (version 0.7.17) with default parameters. Differential abundance (DA) of the reads associated with the respective bins in each assembly across the respective replicates was determined using the edgeR function implemented in R, requiring FDR adapted p value< 0.05. Read mapping information is shown in Supp. Table 1. Based on DA, MAGs were classified either as associated with healthy soil (H; BjSa differentially abundant), sick soil (S; NTC differentially abundant), or not associated with either soil type (NA; not differentially abundant at any treatment site).

Gene calling and annotation was performed with the Annotate_bins module of MetaWRAP. Pathway completeness was determined with KEGG Decoder v 1.0.8.279 based on the KO annotations extracted from Annotate_bins assignments. The quality of the genomes was determined with CheckM80. For phylogenomic analyses and taxonomic classification of each bacterial and archaeal genome, we searched for and aligned 120 bacterial marker genes of the MAGs using the identity and align commands of GTDB-Tk v1.5.044. MAGs were dereplicated using dRep v2.3.281 using the default settings and MAGs from the six assemblies were clustered into a single non-redundant set. Phylogenomic trees were rooted by randomly selecting a genome from the sister lineage to the genus as determined from the topology of the bacterial and archaeal GTDB R06-RS202 reference trees. Closely related GTDB taxa identified with the “classify_wf” workflow were filtered using the taxa-filter option during the alignment step. Overall, a set of 395 high-quality genomes (≥90% completeness, <5% contamination) was used for downstream analyses.

Genome Scale Metabolic Model (GSMM) reconstruction, analysis and characterization of the MAG collection

GSMMs were constructed for each of the 395 MAGs using CarveMe v 1.5.1 31, a python-based tool for GSMM reconstruction. Installation and usage of CarveMe was done as suggested in the original CarveMe webpage (https://carveme.readthedocs.io/en/latest/). The solver used for GSMM reconstruction is Cplex (v. 12.8.0.0). All GSMMs were drafted without gap filling as it might mask metabolic co-dependencies52. Stoichiometric consistency of all GSMMs was systematically assessed via the standardized MEMOTE test suite, a tool for GSMM quality and completion assessment47. GSMMs not stoichiometrically balanced were filtered out, as they might produce infeasible simulation results.

Analyses and simulations of GSMMs, as well as retrieval of model attributes (reactions, metabolites and exchanges, etc.), were conducted via the vast array of methods found in COBRApy82, a python coding language package for analyzing constraint-based reconstructions. For each GSMM, initial growth simulations took place in three different model-specific environments: rich medium, poor medium and poor medium + exudates. The rich medium was composed of all the exchange reactions (i.e., exchange reaction for specific compounds) a model holds, gathered by the “exchanges” attribute found in each model. The poor medium was composed of the minimal set of compounds required for a specific GSMM/species to grow at a fixed rate. This set of compounds was identified using the minimal_medium module from COBRApy (minimize components=True, growth rate of 0.1 biomass increase hour-1). Poor medium + exudates was defined as the poor medium with the addition of an array of apple root exudates. These compounds were retrieved from two metabolomics studies characterizing the exudates of apple rootstocks grown in Lane Mountain Sand (Valley, WA)48,49. Exudate compounds were aligned with the BIGG database83 in order to format them for use in COBRApy. For each media type, GSMM growth rates were calculated by solving each model using the summary method in COBRApy, which utilizes Flux Balance Analysis (FBA) for maximizing biomass.

Construction of a common root environment medium and application of the Microbial Community Succession Module (MCSM)

In order to simulate the dynamics of the rhizosphere community in the root environment, a fourth growth medium representing a natural-like environment was defined and termed the “rhizosphere environment”. The rhizosphere environment was composed of two arrays of compounds: 1) the exudates (as described above) and 2) inorganic compounds essential for sustaining bacterial growth. This array was determined according to the minimal set of compounds identified for each GSMM (also described above). Rhizosphere environment components are provided in Supp. Table 2. Both sets of compounds were then consolidated into one array in which further simulations were conducted.

The MCSM (which is the first module out of three comprising this computational framework) simulates the growth of a microbial community by iteratively growing the GSMMs in the community and adding compounds “secreted” by the growing species to the simulation-environment (i.e. the medium), thus enriching the medium/environment and supporting further growth. Unlike Flux Balance Analysis (FBA), which is used for gathering an arbitrary solution regarding non-optimized fluxes, the MCSM uses Flux Variability Analysis (FVA) to determine exchange fluxes84. FVA gathers the full range of exchange fluxes (including both secretion and uptake) that satisfy the objective function of a GSMM (i.e., biomass increase). The FVA fraction of optimum was set to 0.9 (sustaining the objective function at 90% optimality, allowing a less restricted secretion profile). Secretion compounds added to the updated medium in each iteration were set to be given in optimal fluxes in next iteration, to ensure a metabolic effect based on the presence of specific metabolites in the environment, rather than their quantity. Flux boundaries of updated medium components were set to 1000 mmol/ gDW hour (for a specific exchange compound; gDW gram Dry Weight).

MCSM was initially simulated in the rhizosphere environment medium. After each growth iteration, GSMM growth-values and the compounds secreted by the species growing were collected, where the latter were added to the medium for the next iteration as described above. Growth iterations continued until no new compounds were secreted and no additional GSMMs had grown. After the 3rd iteration, a set of organic phosphorous compounds (containing both carbon and phosphorous) was added to the environment. These compounds were gathered from the pool of model-specific minimal compounds selected for use in the poor medium. Information regarding the chemical formula of these compounds was gathered with the formula attribute of each compound object in COBRApy. Along MCSM iterations, secreted metabolites were classified into biochemical categories based on BRITE annotations85 or, in the absence of classification, manually. MCSM was further applied to inspect the framework’s ability to tracing cellulose degradation using cellulose medium (Supp. Table 3). The tutorial for the MCSM stage of the framework workflow is found at the GitHub page, along with the GSMMs and media files. Instructions for conducting the analysis are in the README.md file https://github.com/FreilichLab/Trophic_interactions_predicting_framework.

Construction of the exchange network and its untangling for screening sub-network motifs

For each GSMM, a directed bipartite network representing all potential metabolic exchanges occurring within the rhizosphere community was constructed based on uptake and secretion data derived from MCSM iterations. Edges in the network were connected between GSMM nodes and metabolite nodes, with edge directionality indicating either secretion or uptake of a metabolite by a specific GSMM. The network was constructed with the networkx package, a python language package for the exploration and analysis of networks and network algorithms. Network-specific topography was obtained using the Kamada-kawai layout86. Information regarding the degree and connectivity of the different node types was acquired from the graph object (G). Code for the network construction module in the framework is found at the project’s GitHub page under the name NETWORK.py.

Untangling the exchange network into individual paths (i.e. sub-network motifs) was done using the all_shortest_paths function in networkx86, applied over the exchanges network. Briefly, the algorithm screens for all possible shortest paths within the network, specifically screening for paths starting with an exudate node and ending secreted metabolite nodes (secreted by bacterial species but not consumed). This algorithm yielded two types of paths: 1) plant-microbe paths (PM) in which node positions one, two, and three represented exudate, microbe and secreted metabolite, respectively, and 2) plant-microbe-microbe paths (PMM). PMM paths (length of five nodes) were constructed based on PM paths (length of three nodes), in which positions four and five displayed unique (i.e. not found in PM paths) microbe and metabolite nodes, respectively. Code for the network untangling module in the framework is found at the project’s GitHub page under the name PATHS.py.

Associating PM and PMM sub-network motifs features with soil health

Sub-network motifs (PM and PMM) were functionally classified as associated with healthy soil (H), sick soil (S), or not-associated with either soil type (NA) based on differential abundance of the corresponding MAGs (PM) or MAGs combination (PMM) in the sub-network. Next, the GSMMs in both PM and PMM sub-networks were characterized according functional classification. For PMMs, the distribution of counts of classified sub-networks, at the different positions, was compared using the ANOVA test, followed by a Tukey test to significantly distinguish the groups. GSMM classifications were further projected on uptake and secreted metabolites in the pathway motifs. For simplification, the analysis focused only on PM paths because PMM paths incorporate PM paths, and exchanges within a PMM path do not directly reflect the effect of an exudate on the secreted end-product (but over the intermediate compound). On that account, start/end metabolites in PM paths were associated with H/S/NA paths based on one-sided hypergeometric test, comparing the frequency of each compound in a functionally characterized path type (either H or S) vs. its frequency in NA classified paths and the reciprocal data set.

Data availability

A reproducible workflow for central computational analyses was deposited in https://github.com/FreilichLab/Trophic_interactions_predicting_framework, including all GSMMs. Raw metagenome sequences used in this study have been deposited in NCBI under BioProject accession number PRJNA779554 (https://www.ncbi.nlm.nih.gov/bioproject/PRJNA779554). MAGs FASTA sequences generated and used in this work are located in https://github.com/FreilichLab/Ginatt_MAGs.

Acknowledgements

We would like to express our gratitude to Asaf Sadeh and Roni Gafni for their advice as part of the Neve Yaar Stats and R support forum, and to Ariana Basile for her advice regarding the conductance of quality control tests over the genome scale metabolic models. We would also like to thank Maya Bar Yehuda for graphical design.

This work was funded by the United States-Israel Binational Agricultural Research and Development Fund (BARD) [grant number US-5390-21].

Supplementary information

Metabolic pathway completion analysis of community de-replicated MAGs.

The presented cluster map links the MAGs with different metabolic pathways based on completion values of a specific metabolic pathway. Values are concluded based on the ratio between the count of EC numbers present in a MAG, which correspond to a specific pathway (derived from annotated genes), and the total EC numbers in that specific KEGG pathway. Left axis represent the KEGG KO of a set of specific functions, which are indicated on the right axis. Top and bottom color bars on the upper axis represent the taxonomic annotation of the MAGs (phylum and class, respectively), which are indicated on the bottom axis.

Distribution of GSMM attributes at the phylum level.

Subplots of reactions, metabolites and exchanges presenting the distribution of the respective attributes for each group of bacteria, affiliated with a specific phylum.

Bacterial growth rates in different environments.

Each row represents the growth rate (biomass increase hour-1) of species in a specialized medium; from top to bottom: poor medium, poor medium supplemented with root exudates, root environment medium and rich medium. Each column represents a GSMM. Models are sorted by their growth score in the root environment medium. Poor, Rich, and Poor + exudates media are model specific, thus sustain growth of all models. The horizontal color bar on top of the plot represents the phyla of the corresponding GSMM. Actual growth values are provided in Supp. Data 5.

FVA performances of GSMMs in different environments (Supp.

Fig. 3, Supp. Data 5). A. Distribution of potentially active exchange reactions (non-zero minimum FVA flux) in the different environments. Solid line inside each violin indicates the interquartile range (IQR). White point in IQR indicates the median value. Whiskers extending from the IQR indicate the range within 1.5 times the IQR from the quartiles. Violin width at a given value represents the density of data points at that value. B. Loopless FVA scores compared to regular FVA for models in the 3 different environments. Bars indicate the count of active fluxes (non-zero minimum FVA flux). Only a subset of models was used for this analysis.

Phylogenetic distribution of models growing along iterations at the order level.

Bars represent the taxonomic distribution of the community (i.e., overall models that grew along iteration, with respect to the corresponding iteration). Colors are indicative of different orders. Orders with insufficient nomenclature were classified as “Others”.

Uptake and secretion degrees of biochemically-classified metabolite nodes, as inferred from the communal interaction network.

Values indicate mean amount of links between community GSMMs and nodes classified to the corresponding category. Biochemical classification was conducted both based on BRITE database and manually.

Application of MCSM over the process of cellulose decomposition as described by Kato et al1. 5-partite network exhibiting the uptake of cellulose oligomers (4 and 6 units of connected D-glucose) by primary decomposers, through secretion of intermediate compounds and their metabolization by secondary decomposers to CO2. Distribution of phyla of primary and secondary decomposers is denoted by pie charts. Though MAGs were notconstructed for the original species as in Kato et al., among the primary consumers, species corresponding to the Acidobacteria (Acidobacteriales)2, Actinobacteria3, Bacteriodetes4, Proteobacteria (Xanthomonadales)5 and Verrucobacteria6 groups are found to be capable of degrading cellulose compounds via enzymatic mechanisms.

Community GSMM-centric characterization of plant-microbe PM) and plant-microbe-microbe (PMM) motifs. A, B, C. Frequency of community GSMMs in) PM, PMM (first/second position) in subnetwork motifs, respectively. D. correlation of GSMMs position frequency in PMM subnetworks; Pearson, P value=0.009.

Summary table of the mapping of reads to MAGs across treatments. Treatment types are fully descriebd in Berihu et al7.

Table of compounds used throughout iterations in the MCSM

Initial medium compounds used in MCSM Cellulose degradation process