Recurrent neural networks enable design of multifunctional synthetic human gut microbiome dynamics

  1. Mayank Baranwal
  2. Ryan L Clark
  3. Jaron Thompson
  4. Zeyu Sun
  5. Alfred O Hero  Is a corresponding author
  6. Ophelia S Venturelli  Is a corresponding author
  1. Department of Systems and Control Engineering, Indian Institute of Technology, India
  2. Division of Data & Decision Sciences, Tata Consultancy Services Research, India
  3. Department of Biochemistry, University of Wisconsin-Madison, United States
  4. Department of Chemical & Biological Engineering, University of Wisconsin-Madison, United States
  5. Department of Electrical Engineering & Computer Science, University of Michigan, United States
  6. Department of Biomedical Engineering, University of Michigan, United States
  7. Department of Statistics, University of Michigan, United States
  8. Department of Bacteriology, University of Wisconsin-Madison, United States

Abstract

Predicting the dynamics and functions of microbiomes constructed from the bottom-up is a key challenge in exploiting them to our benefit. Current models based on ecological theory fail to capture complex community behaviors due to higher order interactions, do not scale well with increasing complexity and in considering multiple functions. We develop and apply a long short-term memory (LSTM) framework to advance our understanding of community assembly and health-relevant metabolite production using a synthetic human gut community. A mainstay of recurrent neural networks, the LSTM learns a high dimensional data-driven non-linear dynamical system model. We show that the LSTM model can outperform the widely used generalized Lotka-Volterra model based on ecological theory. We build methods to decipher microbe-microbe and microbe-metabolite interactions from an otherwise black-box model. These methods highlight that Actinobacteria, Firmicutes and Proteobacteria are significant drivers of metabolite production whereas Bacteroides shape community dynamics. We use the LSTM model to navigate a large multidimensional functional landscape to design communities with unique health-relevant metabolite profiles and temporal behaviors. In sum, the accuracy of the LSTM model can be exploited for experimental planning and to guide the design of synthetic microbiomes with target dynamic functions.

Editor's evaluation

The ultimate goal of this work is to apply machine learning to learn from experimental data on temporal dynamics and functions of microbial communities to predict their future behavior and design new communities with desired functions. Using a significant amount of experimental data, the authors suggest a method that outperforms the state-of-the-art approach. The work is of broad interest to those working on microbiome prediction and design.

https://doi.org/10.7554/eLife.73870.sa0

Introduction

Microbial communities perform chemical and physical transformations to shape the properties of nearly every environment on Earth from driving biogeochemical cycles to mediating human health and disease. These functions performed by microbial communities are shaped by a multitude of abiotic and biotic interactions and vary as a function of space and time. The complex dynamics of microbial communities are influenced by pairwise and higher order interactions, wherein interactions between pairs of species can be modified by other community members (Sanchez-Gorostiaga et al., 2019; Mickalide and Kuehn, 2019; Hsu et al., 2019). In addition, the interactions between community members can change as a function of time as the community continuously reacts to and modifies its environment (Hart et al., 2019). Therefore, flexible modeling frameworks that can capture the complex and temporally changing interactions that determine the dynamic behaviors of microbiomes are needed. These predictive modeling frameworks could be used to guide the design of interventions to precisely manipulate community-level functions to our benefit.

The generalized Lotka-Volterra (gLV) model has been widely used to predict community dynamics and deduce pairwise microbial interactions shaping community assembly (MacArthur, 1970). For example, the gLV model has been used to predict the assembly of tens of species based on absolute abundance measurements of lower species richness (i.e. number of species) communities (Venturelli et al., 2018; Mounier et al., 2008; Clark et al., 2021). The parameters of the gLV model can be efficiently inferred based on properly collected absolute abundance measurements and can provide insight into significant microbial interactions shaping community assembly (Bucci et al., 2016). However, this model does not represent higher order interactions or microbial community functions beyond species growth. To capture such microbial community functions, composite gLV models have been developed to predict a community-level functional activity based on species abundance at an endpoint (Clark et al., 2021; Stein et al., 2018). However, these approaches have been limited to the prediction of a single community-level function at a single time point. Therefore, new modeling frameworks are needed to capture temporal changes in multiple community-level functions, such as tailoring the metabolite profile of the human gut microbiome (Fischbach and Sonnenburg, 2011).

Neural network architectures, such as recurrent neural networks (RNNs), are universal function approximators (Dambre et al., 2012; Schäfer and Zimmermann, 2006) that enable greater flexibility compared to gLV models for modeling dynamical systems. However, neural network based models often require significantly more model parameters, which poses additional challenges to model fitting and generalizability. A particular RNN model architecture called long short-term memory (LSTM) addresses challenges associated with training on sequential data by incorporating gating mechanisms that learn to regulate the influence of information from previous instances in the sequence (Lipton et al., 2015). From their initial successes in speech recognition (Graves et al., 2005) and computer vision (Byeon et al., 2015), LSTMs have recently been applied to modeling biological data such as subcellular localization of proteins (Sonderby et al., 2015) and prediction of biological age from activity collected from wearable devices (Rahman and Adjeroh, 2019). Related to microbiomes, deep learning frameworks have been applied to predict gut microbiome metabolites based on community composition data (Le et al., 2020), final community composition based on microbial interactions (Larsen et al., 2012) and end-point community composition based on the presence/absence of species (Michel-Mata et al., 2021). In addition, RNN architectures have been used to model phytoplankton (Jeong et al., 2001) and macroinvertebrate (Chon et al., 2001) community dynamics. Despite achieving reasonable prediction performance, previous efforts at modeling ecological system dynamics using RNNs are typically limited to handful of organisms (<10), have provided limited model interpretation and have not been leveraged to predict temporal changes in community behaviors. In addition, RNN architectures have not been used for bottom-up community design, which could be exploited for applications in bioremediation, bioprocessing, agriculture, and human health (Leggieri et al., 2021; Lawson et al., 2019; Clark et al., 2021).

Here, we apply LSTMs to model time dependent changes in species abundance and the production of key health-relevant metabolites by a diverse 25-member synthetic human gut community. LSTMs are a good model for microbiomes because (1) LSTMs are a natural choice for a neural network based model of time-series data (Goodfellow et al., 2016); (2) LSTMs are highly flexible models that can capture complex interaction networks that are often neglected in ecological models; (3) LSTMs can be modified to capture additional system variables such as environmental factors (e.g. metabolites). In addition, LSTMs have some advantages over traditional RNNs because they can capture long-term dependencies. LSTMs have additional parameters that adjust the effects of earlier time points on the predictions at later time points in a time-series. We use the trained LSTM model to elucidate significant microbe-microbe and microbe-metabolite interactions.

The flexibility and accuracy of the LSTM model enabled systematic integration into our experimental planning process in two stages. First, the LSTM was fit to data from a previous study with low temporal resolution involving a moderate number of synthetic microbial communities (Clark et al., 2021). The distribution of LSTM metabolite predictions was then used to identify sparse sub-communities in the tails of the distribution, communities that we refer to as ‘corner cases’. A second experiment was then performed that expanded the training data for the LSTM in the vicinity of these corner cases with higher time resolution. The LSTM-guided two-stage experimental planning procedure substantially reduced the number of experiments compared to random sampling of the functional landscape with temporal resolution in a single stage experiment. Therefore, the LSTM analysis enabled our main findings on dynamical behaviors of communities and identified the key species critical for community assembly and metabolite profiles. Compared to the gLV model, the proposed LSTM framework provides a better fit to the experimental data, captures higher order interactions and provides higher accuracy predictions of species abundance and metabolite concentrations. In addition, our approach preserves model interpretability through a suitably developed gradient-based framework and locally interpretable model-agnostic explanations (LIME) (Ribeiro et al., 2016a). Using our time-series data of species abundance and metabolite concentrations, we demonstrate that the temporal behaviors of the communities cluster into distinct groups based on the presence and absence of sets of species. Our results highlight that LSTM models are powerful tools for predicting and designing the dynamic behaviors of microbial communities.

Results

LSTM outperforms the generalized Lotka Volterra ecological model

Our first objective was to compare the predictive performance of the LSTM model to a commonly used ecological modeling approach. The gLV model is a widely used ecological model consisting of a coupled set of ordinary differential equations that captures the growth dynamics of members of a community based on their intrinsic growth rate and interactions between all pairs of community members (Venturelli et al., 2018). Therefore, the gLV model is not suited to capture higher order interactions among species or changes in inter-species interactions resulting from variation in the environment. By contrast, the LSTM modeling framework is flexible and can capture complex relationships between species as well as time-dependent changes in inter-species interactions. To evaluate the strengths and limitations of these modeling frameworks, we characterized the performance of the gLV and LSTM models in learning the behavior of a ground truth model that included pairwise and third-order interactions between species (Methods).

Our ground truth model is based on a gLV model of a 25-member synthetic gut community from a previous study (Clark et al., 2021). To perturb our ground truth model with higher order interactions, we add third-order interaction terms with either mild or moderate parameter magnitudes (Methods). Using this model, we simulate sub-communities that vary in the number of species. Of all the randomly simulated communities, those containing six or fewer species are used to train both the gLV and LSTM models (624 training communities), while the remaining communities (3299 test communities with ≥10 species) are used as a hold-out test set. The 624 training communities includes 25 monospecies, 300 unique pairwise communities, 100 unique three-member communities, 100 unique five-member communities, and 99 unique six-member communities. The simulated data spans 48 hr separated by an interval of 8 hr, reflecting the experimentally feasible periodic sampling interval of 8 hr.

Recall that we restrict our attention to simpler (fewer species) communities for training to determine if the behavior of lower order communities can be used to predict higher order communities. Further, pairwise inter-species interactions are easier to decipher in lower order communities due to potential co-variation among parameters (correlations between parameters) as a consequence of model structure or methods of data collection. A similar training/test partitioning was used to generate predictive models of complex community behaviors (Venturelli et al., 2018; Clark et al., 2021; Hromada et al., 2021).

The prediction performance of the trained gLV and LSTM models on the hold-out test set are similar for the ground truth model containing only pairwise interactions (Pearson R2 of 0.89 and 0.85 for gLV and LSTM models, respectively) (Figure 1b, c left). For the ground truth model with mild third-order interactions (interaction coefficients that do not exceed 25% of the maximum of the absolute values of the coefficients for the second-order interactions), the performance of the LSTM model is substantially better than the gLV model with the R2-score of 0.85, as opposed to 0.52 for the gLV model (Figure 1b, c, middle). In addition, the LSTM model performs significantly better than the gLV model for higher magnitude (moderate) third-order perturbations (third-order interaction coefficients that do not exceed 50% of the maximum of the absolute values of the coefficients for second-order interactions) (Figure 1b, c, right).

Comparison of generalized Lotka Volterra (gLV) and Long Short Term Memory (LSTM) model prediction performance of species abundance in a 25-member microbial community in response to third-order perturbations of varying magnitude.

For both models, training data consists of low species richness communities (>0.05 species, N=82,475, Pearson correlation p-value lt0.0001). (a) & (d): Data was generated using a gLV model that captures monospecies growth and pairwise interactions. Scatter plots of true versus predicted species abundance at t=48hr using the gLV and LSTM models, respectively. X represents a vector of species abundances. (b) & (e) Scatter plot of true versus predicted species abundance of the gLV and LSTM models, respectively when the simulated data is subjected to low magnitude (mild) third-order interactions. (c) & (f) Scatter plot of true versus predicted species abundance of gLV and LSTM models, respectively when the simulated data is further subjected to moderately large third-order interactions. (g) Scatter plot of true versus predicted species abundance for the LSTM model. The training set included a set of higher richness communities (50 each of 11 and 19 member communities). All predictions are forecasted from the species abundance at time 0.

This in silico analysis highlights the advantages of adopting more expressive neural network models over severely constrained ecological models such as gLV. In addition, a key advantage of the proposed LSTM model over the gLV model is the amount of time required for training the two models. The gLV equations are coupled nonlinear ordinary differential equations, and thus training gLV models requires substantial computational time (nearly 5–6 hr), whereas the LSTM models can be trained in minutes on the same platform. Therefore, the LSTM approach is highly suited for real-time training and planning of experiments. Note that both the composite as well as the LSTM model require tuning of hyperparameters for optimal performance. The details of the computational implementation are provided in the Methods section.

To further leverage this in silico experimental approach, we aimed to identify what type of datasets are required for building predictive models of high richness community behaviors depending on the nature of their underlying interactions. In further analyzing our results, we observed a crescent shaped prediction profile, representing an inherent bias, which we hypothesized was due to the training data containing only communities with ≤6 species (Figure 1c). To test this hypothesis, we augmented the training set with 100 communities enriched with a larger number of species (randomly sampled 11 and 19-member communities). Using this enriched training set, the LSTM model accurately predicts the community dynamics of the hold-out set with an R2 of 0.95 (Figure 1d). In sum, the LSTM has difficulty predicting the behavior of high richness communities when the training data consists of only low richness communities. However, adding a moderate number of high richness communities to the training set eliminates the prediction bias and improves the prediction performance of the LSTM.

LSTM accurately predicts experimentally measured microbial community assembly

After validating our methods using the ground truth modeling approach described above, we evaluated the ability of the LSTM to capture the dynamics of experimentally characterized synthetic human gut microbial communities. We tested the effectiveness of the LSTM on time-resolved species abundance data from a previous study of a well-characterized twelve-member synthetic human gut community (Venturelli et al., 2018). The experimental data consists of species abundance sampled approximately every 12hr. A total of 175 microbial communities with sizes varying from 2 to 12 were used to train and evaluate the LSTM model. Of the 175 microbial communities, 102 microbial communities were selected randomly to constitute the training set, while the remaining 73 microbial communities constituted the hold-out test set (Supplementary file 1). This train/test split was similar to that used to train a gLV model in the previous study (Venturelli et al., 2018). The previous study represented perturbations in cell densities and nutrient availability by diluting the community 20-fold every 24 hr into fresh media (i.e. passaging of the communities) (Figure 2a ). The sequential dilutions of the communities are external perturbations that introduce further complexity towards model training.

Figure 2 with 1 supplement see all
The LSTM model can predict the temporal changes in species abundance in a 12-member synthetic human gut community in response to periodic dilution (passaging).

(a) Proposed LSTM modeling methodology for the dynamic prediction of species abundance in a microbial community. The initial abundance information is an input to the first LSTM cell, the output of which is trained to predict abundance at the next time point. Consequently, the predicted abundance becomes an input to another LSTM cell with shared weights to predict the abundance at the subsequent time point. The process is repeated until measurements at all time points are available. X represents a vector of species abundances. Thus, all predictions are forecasted from the abundance at time 0. (b) Scatter plot of measured (true) and predicted species abundance of a 12-member synthetic human gut community at 12 hr (N=876, p-value =2.44e-257). (c) Scatter plot of measured (true) and predicted abundance at 24 hr (p-value =6.51e-257). (d) Scatter plot of measured (true) and predicted abundance at 36 hr (p-value =7.42e-257). (e) Scatter plot of measured (true) and predicted abundance at 48 hr (p-value =1.66e-227). (f) Scatter plot of measured (true) and predicted abundance at 60 hr (p-value =3.39e-227).

We trained a LSTM network to predict species abundances at various time points given the information of initial species abundance. We found that a total of five LSTM units can predict species abundance at different time points (12, 24, 36, 48, and 60 hr) based on the initial species abundance. The output of each LSTM unit is used as an input to the next unit. However, the input to the current LSTM unit is randomized between the output from the previous LSTM unit and the true abundance at the current time point in the randomized teacher forcing mode of training in order to eliminate temporal bias in the prediction of end-point species abundances. We did not model the passaging perturbations explicitly, since the experimental procedure was consistent across all communities. This also highlights the advantage of using black-box approaches, such as the LSTM network, where physical parameters such as dilution do not need to be explicitly modeled. Here, each LSTM unit consists of a single hidden layer comprising of 2048 hidden units with ReLU activation. The details on hyperparameter tuning, learning rates, and choice of optimizer are provided in the Methods section.

Despite the passaging perturbations and variation in the sampling times, the LSTM accurately predicts (Pearson R2-scores of 0.74, 0.73, 0.74, 0.70, and 0.69 at time points 12, 24, 36, 48, and 60 hr, respectively) not only the end-point species abundance, but also the abundances at intermediate time points on hold-out test sets (Figure 2b-f). These results demonstrate that the LSTM model can accurately predict the temporal changes in species abundance of multi-species communities in the presence of external perturbations. Representative communities that were accurately or poorly predicted by the LSTM are shown in (Figure 2—figure supplement 1).

LSTM enables end-point design of multifunctional synthetic human gut microbiomes

The chemical transformations (i.e. functions) performed by the community are the key design variables for microbiome engineering goals, as evidenced by their major impacts on human health (Sharon et al., 2014). Thus, we explored prediction of microbial community functions by applying the LSTM framework to design health-relevant metabolite profiles using synthetic human gut communities.

A core function of gut microbiota is to transform complex dietary substrates into fermentation end products such as the beneficial metabolite butyrate, which is a major determinant of gut homeostasis (Litvak et al., 2018). In a previous study, we designed butyrate-producing synthetic human gut microbiomes from a set of 25 prevalent and diverse human gut bacteria using a composite gLV and statistical model (Clark et al., 2021). While the composite model approach was successful in predicting butyrate concentration, designing community-level profiles of multiple metabolites adds substantial complexity and limited flexibility using the composite modeling approach. Thus, we leveraged the accuracy and flexibility of LSTM models to design the metabolite profiles of synthetic human gut microbiomes. We focused on the fermentation products butyrate, acetate, succinate, and lactate which play important roles in the gut microbiome’s impact on host physiology and interactions with constituent community members (Fischbach and Sonnenburg, 2011).

We used the species abundance and metabolite concentrations from our previous work (Clark et al., 2021) to train an initial LSTM model. This model uses a feed-forward network (FFN) at the output of the final LSTM unit that maps the endpoint species abundance (a 25-dimensional vector) to the concentrations of the four metabolites (Figure 3a). The entire neural network model comprising LSTM units and a feed-forward network is learned in an end-to-end manner during the training process, (i.e. all the network weights are trained simultaneously). Cross-validation of this model (Model M1, Supplementary file 1) on a set of hold-out community observations shows good agreement between the model predictions and experimental measurements for metabolite concentrations and microbial species abundances (Figure 3—figure supplement 1). Thus, we used this model to design high species richness (i.e. >10 species) communities with tailored metabolite profiles (Figure 3a).

Figure 3 with 7 supplements see all
LSTM-guided design and interpretability of community-level metabolite production profiles (a) Schematic of model-training and design of communities with tailored metabolite outputs.

(b) Heatmap of butyrate and lactate concentrations of all possible communities predicted by the LSTM model M1. Grey points indicate communities chosen via k-means clustering to span metabolite design space. Colored boxes indicate ‘corner’ regions defined by 95th percentile values on each axis with points of the corresponding color indicating designed communities within that ‘corner’. Insets show heat maps of acetate and succinate concentrations for all communities within the corresponding boxes on the main figure. Boxes on the inset indicate ‘corners’ defined by 95th percentile values on each axis with colored points corresponding to the same points indicated on the main plot. (c) Cross-validation accuracy of LSTM model trained and validated on a random 90/10 split of all community observations (model M2), evaluated as Pearson correlation R2 for the correlation of predicted versus measured for each variable (all p-valueslt0.05, N and p-value for each test reported in Supplementary file 1). Dashed line indicates R2=0.5, which is used as a cutoff for including a variable in the subsequent network diagrams. (d) and (e) Network representation of median LIME explanations of the LSTM model M2 from (c) for prediction of each metabolite concentration (d) or species abundance (e) by the presence of each species. Edge widths are proportional to the median LIME explanation across all communities from (b) used to train the model in units of concentration (for (d)) or normalized to the species’ self-impact (for (e)). Only explanations for those variables where the cross-validated predictions had R2>0.5 are shown. Networks were simplified by using lower thresholds for edge width (5 mM for (d), 0.2 for (e)). Red and blue edges indicate positive and negative contributions, respectively.

We first used the LSTM model M1 to simulate every possible combination of >10 species (26,434,916 total communities). The simulated communities separate into two regions: one centered around a dense ellipse of high butyrate concentration characterized by communities containing the butyrate-producing species Anaerostipes caccae (AC) and a second dense ellipse of communities that produce low levels of butyrate and lacked AC (Figure 3b). This bimodality due to the presence/absence of AC is consistent with our previous finding that AC is the strongest driver of butyrate production in this system (Clark et al., 2021). In addition, the strong negative correlation between lactate and butyrate in the AC+ cluster of communities (R2=0.72, p<0.001, N=14,198,086) is consistent with the ability of AC to transform lactate into butyrate (Clark et al., 2021). These results demonstrate that the LSTM model can capture the major microbial drivers of metabolite production as well as the correlations between different metabolites.

We used our simulated metabolite production landscape to plan informative experiments for testing the predictive capabilities of our model. First, we designed a set of ‘distributed’ communities that spanned the range of typical metabolite concentrations predicted by our model. To this end, we selected 100 communities closest to the centroids of 100 clusters determined using k-means clustering of the four-dimensional metabolite space. Second, we designed a set of communities to test our model’s ability to predict extreme shifts in metabolite outputs. To do so, we identified four ‘corners’ of the distribution in the lactate and butyrate space (Figure 3b). We next examined the relationship between acetate and succinate within each of these corners and found that the distributions varied depending on the given corner (Figure 3b, inset). The total carbon concentration in the fermentation end products across all predicted communities displayed a narrow distribution (mean 316 mM, standard deviation 20 mM, Figure 3—figure supplement 2). The production of the four metabolites are coupled due to the structure of metabolic networks and fundamental stoichiometric constraints (Oliphant and Allen-Vercoe, 2019). Therefore, the model learned the inherent ‘trade-off’ relationships between these fermentation products based on the patterns in our data. We chose a final set of 80 ‘corner’ communities for experimental validation (five communities from each combination of maximizing or minimizing each metabolite, Methods).

By experimentally characterizing the endpoint community composition and metabolite concentrations of the 180 designed communities, we found that the LSTM model M1 accurately predicted the rank order of metabolite concentrations and microbial species abundances. The LSTM model substantially outperformed the composite model (gLV and regression, model from previous work [Clark et al., 2021]) trained on the same data for the majority (59%) of output variables (Figure 3—figure supplement 3a). Additionally, replacing the regression module of the composite model with either a Random Forest Regressor or a Feed Forward Network did not improve the metabolite prediction accuracy beyond that of the LSTM (Figure 3—figure supplement 3a). One of the key limitations of the composite models is that the metabolite variables are a function of the endpoint species abundance, but the species abundances are not a function of the metabolite concentrations. By contrast, the LSTM model can capture such feedbacks between metabolites and species. Notably, the LSTM model prediction accuracy for the metabolites was similar for both the ‘distributed’ and ‘corner’ communities (Figure 3—figure supplement 3b–e). These results indicate that our model is useful for designing communities with a broad range of metabolite profiles that includes those at the extremes of the metabolite distributions.

To determine if the LSTM model could separate groups of communities with extreme behaviors, we treated the ‘corners’ as classes and quantified the classification accuracy of our model. The model accurately classified the communities when considering only butyrate and lactate concentrations. However, the model had poorer separation when acetate and succinate were also considered in defining the classes (Figure 3—figure supplement 3f). The misclassification rate was higher for small Euclidean distances between classes and decreased with the Euclidean distance (Figure 3—figure supplement 3g). This implies that the insufficient variation in concentrations due to fundamental stoichiometric constraints limited our ability to define 16 distinct classes that maximized/minimized each metabolite. While model M1 accurately predicted metabolite concentrations and the majority of species abundances, several individual species abundances were poorly predicted (R2=0-0.6, Figure 3—figure supplement 3a). Thus, we used the dataset to improve the LSTM model. To this end, we combined the new observations with the original observations and randomly partitioned the data into 90% for training and 10% for cross-validation. The resulting model (M2, Supplementary file 1) was substantially more predictive of species abundances (R2>0.5 for all but five species FP, RI, CA, BA, CH (Figure 3c)).

Using local interpretable model-agnostic explanations to decipher interactions

One of the commonly noted limitations of machine learning models is their lack of interpretability for extracting biological information about a system. Fortunately, generally applicable tools have been developed to aid in model interpretation. Thus, we sought to use such methods to decipher key relationships among variables within the LSTM to deepen our biological understanding of the system. We used local interpretable model-agnostic explanations (LIME) (Ribeiro et al., 2016b), to quantify the impact of each species’ presence on each metabolite and species in each of the sub-communities used to train model M2. We used the median impact of each species presence on each metabolite or species across all training instances to generate networks that revealed microbe-metabolite (Figure 3d) and microbe-microbe (Figure 3e) interactions. In general, these networks represent broad design principles for the community metabolic outputs by indicating which species have the most consistent and strong impacts on each metabolite and species abundance across a wide range of sub-communities. For instance, the metabolite network highlights Anaerostipes caccae (AC) as having the largest positive effect on butyrate production with an additional positive contribution from EL and a negative contribution from DP, consistent with the previous composite gLV model of butyrate production by this community (Clark et al., 2021).

In addition, the number of microbial species impacting each metabolite in these networks trended with the number of microbial species that individually produced or consumed each metabolite (Figure 3—figure supplement 4). For example, butyrate displayed the fewest edges (3) and was produced by the lowest number of individual species (4). By contrast, acetate had the most edges (6) and was produced by the largest number of individual species (19). The inferred microbe-metabolite network consisted of diverse species including Proteobacteria (DP), Actinobacteria (BA, BP, EL), Firmicutes (AC, ER, DL) and one member of Bacteroidetes (PC), but excluded members of Bacteroides. Therefore, while Bacteroides exhibited high abundance in many of the communities, they did not substantially impact the measured metabolite profiles but instead modulated species growth and thus community assembly (Figure 3e). We explored the consistency of LIME explanations for the full 25-member community in response to random partitions of the training data to provide insights into the sensitivity of the LIME explanations given the training data (Figure 3—figure supplement 5, Figure 3—figure supplement 6). These results demonstrated that the direction of the strongest LIME explanations of the full community were consistent in sign despite variations in magnitude. One exception is for the species Roseburia intestinalis (RI), which had high variability across different test/train splits. This is consistent with previous observations that RI has substantial growth variability across experimental communities (Clark et al., 2021). In sum, these results demonstrate that in general the LIME explanations were robust to variations in the training data.

The LIME explanations of inter-species interactions exhibited a statistically significant correlation with their corresponding inter-species interaction parameters from a previously parameterized gLV model of this system (Clark et al., 2021; Figure 3—figure supplement 7a). The sign of the interaction was consistent in 80% of the interactions with substantial magnitude (>0.05 in both the LIME explanations and gLV parameters) (Figure 3—figure supplement 7b). This consistency with previous observations suggests that the LSTM model was able to capture similar broad trends in inter-species relationships as gLV (interpreted through the average LIME explanation across all observed communities). The LSTM model captured more nuanced context-specific behaviors (interpreted as the LIME explanation for one specific community context) than the mathematically restricted gLV model, which substantially improved the predictive capability of the LSTM model. These results demonstrate that the LSTM framework is useful for developing high accuracy predictive models for the design of precise community-level metabolite profiles. Our approach also preserves the ability to decipher different types of interactions in the LSTM model that are explicitly encoded in less accurate and flexible ecological models such as gLV.

Sensitivity of LSTM model prediction accuracy highlights poorly understood species and pairwise interactions

Identification of species that limit prediction performance could guide selection of informative experiments to deepen our understanding of the behaviors of poorly predicted communities. Therefore, we evaluated the sensitivity of the LSTM model (model M2) prediction accuracy to species presence/absence and the amount of training data. High sensitivity of model prediction performance to the number of training communities indicates that collection of additional experimental data would continue to improve the model. Additionally, identifying poorly understood communities will guide machine learning-informed planning of experiments. To evaluate the model’s sensitivity to the size of the training dataset, we computed the hold-out prediction performance (R2) as a function of the size of the training set by sub-sampling the data (Figure 4a). We used 20-fold cross-validation to predict metabolite concentrations and species abundance. Our results show that the ability to improve prediction accuracy as a function of the size of the training data set was limited by the variance in individual species abundance in the training set (Figure 4—figure supplement 1). For instance, certain species with low variance (e.g. FP, EL, DP, RI) in abundance in the training set displayed low sensitivity to the amount of training data and were poorly predicted by the model. The high sensitivity of specific metabolites (e.g. lactate) and species (e.g. AC, BH) to the amount of training data indicates that further data collection would likely improve the model’s prediction performance.

Figure 4 with 1 supplement see all
Hold-out prediction performance on sub-communities provides information about poorly understood species and interactions between species.

(a) Sensitivity of metabolite prediction performance (R2) to the amount of training data. Training datasets were randomly subsampled 30 times using 50–100% of the total dataset in increments of 10%. Each subsampled training set was subject to 20-fold cross-validation to assess prediction performance. Lineplot of the mean prediction performance over the 30 trials for each percentage of the data. Error bars denote 1 s.d. from the mean. (b) Schematic scatter plot representing how communities containing species A and B define a poorly predicted subsample of the full sample set (c) Heatmap of prediction performance (R2) of acetate for each subset of communities containing a given species (diagonal elements) or pair of species (off-diagonal elements). (d) Heatmap of prediction performance for acetate, butyrate, lactate, and succinate. A sample subset containing a given species or pair of species included all communities in which the species were initially present. Predictions for each community were determined using 20-fold cross validation so that for each model the predicted samples were excluded from the training samples. N and p-values are reported in Supplementary file 1.

To determine how pairwise combinations of species impacted model prediction performance, we used 20-fold cross-validation to evaluate the prediction performance (R2) on subsets of the total dataset, where subsets were selected based on the presence of individual species or pairs of species (Figure 4b). Using this approach, we identified individual species and species pairs that had the greatest impact on the prediction performance of metabolite concentrations. Sample subsets with poor prediction performance highlight individual species and species pairs whose presence reduced the model’s ability to accurately predict metabolite concentrations. Although the subsets were smaller than the total data set (n=761), calculation of prediction performance was not limited by small sample sizes, where the number of communities in each subset ranged from n=77 to n=478.

The interaction network shown in Figure 3d shows the impact of individual species on each metabolite, but does not provide information about whether the effect is due to individual species or pairwise interactions. To determine whether pairwise interactions influence metabolite concentrations, we quantified how prediction performance changed in response to the presence individual species and pairs of species. Specifically, if prediction performance taken over a subset of communities containing a given species pair was markedly different than prediction performance for the subsets corresponding to the individual species, this implies that the given pairwise interaction impacts metabolite production. Using equation 5 (Methods), we found that the prediction performance of lactate and butyrate were the least sensitive to species pairs (average decrease in prediction performance for subsets with species pairs of 0.72% and 1.10% compared to corresponding single species subsets). However, the prediction performance of acetate and succinate were the most sensitive to the presence of species pairs (increase in prediction performance of 6.68% for acetate and a decrease of 2.951% for succinate). This difference in prediction performance suggests that pairwise interactions influences the production of acetate and succinate, while the production of lactate and butyrate are primarily driven by the action of single species. The sensitivity of acetate and succinate to pairwise interactions is consistent with the inferred interaction network shown in Figure 3d, which highlights multiple species-metabolite interactions for acetate and succinate and sparse and strong species-metabolite interactions for butyrate and lactate.

Pairs of certain Bacteroides and butyrate producers including BY-RI, BU-RI, and BY-AC resulted in reduced prediction performance of acetate. This suggests that interactions between specific Bacteroides and butyrate producers were important for acetate transformations, which is consistent with the conversion of acetate into butyrate. Based on the LIME analysis in Figure 3d, AC, DP, and BP had the largest impact on lactate. Thus, the hold-out prediction performance for lactate was primarily impacted by specific pairs that include these species. In sum, these results demonstrate how the LSTM model can be used to identify informative experiments for investigating poorly understood species and interactions between species, where collection of more data would likely improve model prediction performance.

Time-resolved measurements of communities reveal design rules for qualitatively distinct metabolite dynamics

We next leveraged the LSTM model’s dynamic capabilities to understand the temporal changes in metabolite concentrations and community assembly of the 25-member synthetic gut microbiome. To this end, we chose a representative subset of 95 out of the 180 communities from Figure 3b, Figure 5—figure supplement 1a, 60 communities for training, 34 for validation, plus the full 25-member community and experimentally characterized species abundance and metabolite concentrations every 16 hr during community assembly (Figure 5a). We analyzed the dynamic behaviors of these communities using a clustering technique to extract high-level design rules of species presence/absence that determined qualitatively distinct temporal metabolite trajectories (i.e. broad trends consistent across a set of communities) and exploited the LSTM framework to identify context-specific impacts of individual species on metabolite production (i.e. a more fine-tuned case-by-case analysis).

Figure 5 with 3 supplements see all
Community metabolite trajectories cluster into qualitatively distinct groups which can be classified based on presence and absence of key microbial species.

(a) Schematic of experiment and network representing a minimal spanning tree across the 95 communities where weights (indicated by edge length) are equal to the Euclidean distance between the metabolite trajectories for each community. Node colors indicate clusters determined as described in the Materials and methods. Red node with black outline annotated with ‘25’ represents the 25-member community. Annotations indicate the most specific microbial species presence/absence rules that describe most data points in the cluster of the corresponding color as determined by a decision tree classifier (Materials and methods). Communities that deviate from the rules for their cluster are indicated with a border matching the color of the closest cluster whose rules they do follow. Network visualization generated using the draw_kamada_kawai function in networkx (v2.1) for Python 3. (b–g) Temporal changes in metabolite concentrations for communities within each cluster (indicated by sub-plot border color), with individual communities denoted by transparent lines. Solid lines and shaded regions represent the mean ±1 s.d. of all communities in the cluster. (h) Schematic of LSTM model training and computation of gradients to evaluate impact of species abundance on metabolite concentrations in a specific community context. (i) Heatmap of model M3 prediction accuracy for four metabolites in the 34 validation communities at each time point (Pearson correlation R2, N=34 for all tests). (j) Heatmap of the gradient analysis of model M3 as described in (h) for the full 25-species community. N and p-values are reported in Supplementary file 1.

The temporal trajectories of species abundance and metabolite concentrations showed a wide range of qualitatively distinct trends across the 95 communities (Figure 5b–g). For example, some metabolites concentrations monotonically increased (e.g. butyrate in Figure 5b, c, e and g), monotonically decreased (e.g. lactate in Figure 5b, c) or exhibited biphasic dynamics (e.g. acetate in Figure 5c). To determine if there were communities with similar temporal changes in metabolite concentrations, we clustered communities using a minimal spanning tree (Grygorash et al., 2006) on the Euclidean distance between the metabolite trajectories of each pair of communities (Figure 5a). The resulting six clusters exhibited high quantitative within-cluster similarity and qualitatively distinct metabolite trajectories (Figure 5b–g). Clusters 4 and 5, which contained the largest number of communities, had a high fraction of ‘distributed’ communities (Figure 3b). Clusters with a smaller number of communities contained a higher percentage of ‘corner’ communities (Figure 5—figure supplement 1b,c). Therefore, the LSTM model informed by endpoint measurements of species abundance and metabolite concentrations elucidated ‘corner’ communities with qualitatively distinct temporal behaviors. These communities were unlikely to be discovered via random sampling of sub-communities due to the high density of points towards the center of the distribution and low density of communities in the tails of the distribution (Figure 3b). Additionally, some ‘corner’ communities that were similar in metabolite profiles when considering the endpoint measurement separated into different clusters when considering the dynamic data (e.g. Clusters 2 and 3, which have similar metabolite profiles at 48 hr but qualitatively distinct dynamics) (Figure 5b). This demonstrates that using a community design approach to explore the extremes of system behaviors with a limited time resolution enabled the identification of new features when the communities with extreme functions were characterized with higher time resolution.

To identify general patterns in species presence/absence of these communities that could explain the temporal behaviors of each cluster, we used a decision tree analysis to identify an interpretable classification scheme (Figure 5—figure supplement 1d). Using this approach, the large clusters were separated by relatively simple classification rules (i.e. AC+ for Cluster 4 and AC- for Cluster 5), whereas the smaller clusters had more complex classification rules involving larger combinations of species (3–7 species), all involving AC, DP, and DL (Figure 5a). The influential role of DP was corroborated by a previous study showing that DP substantially inhibits butyrate production (Clark et al., 2021). In addition, the inferred microbe-metabolite networks based on the LSTM model M2 demonstrated that the presence of DL was linked to higher acetate and lower succinate production (Figure 3d), consistent with its key role in shaping metabolite dynamics in this system. The variation in the number of communities across clusters is consistent with previous observations that species-rich microbial communities tend towards similar behavior(s) (e.g. Clusters 4 and 5 contained many communities). By contrast, more complex species presence/absence design rules are required to identify communities that deviate from this typical behavior (e.g. Clusters 1–3 and 6 contained few communities) (Clark et al., 2021).

Using LSTM with higher time-resolution to interpret contextual interactions

While our clustering analysis identified general design rules for metabolite trajectories, there remained unexplained within-cluster variation. Thus, we used the LSTM framework to identify those effects beyond these general species presence/absence rules that determine the precise metabolite trajectory of a given community. Simultaneous predictions of species abundance and the concentration of all four metabolites at all time points necessitates specific modifications to the LSTM architecture shown in Figure 2a. In particular, we consider a 29-dimensional input vector whose first 25 components correspond to the species abundance, while the remaining four components correspond to the concentration of metabolites (Figure 5h). The 29-dimensional feature vector is suitably normalized so that the different components have zero mean and unity variance. The feature scaling is important to prevent dominance of high-abundance species. The output of each LSTM unit is fed into the input block of the subsequent LSTM unit in order to advance the model forward in time. The reason behind concatenating instantaneous species abundances with metabolite concentrations can be understood as follows. Prediction of metabolite concentrations at various time points requires a time-series model (either using ODEs or LSTM in this case). Further, the future trajectory of metabolite concentrations is a function of both the species abundance, as well as the metabolite concentrations at the current time instant. Therefore, we concatenate both the metabolite concentrations and species abundances to create a 29-dimensional feature vector. The trained LSTM framework on the 60 training communities (model M3) displayed good prediction performance on the metabolite concentrations of the 34 validation communities plus the full 25-species community (Figure 5i). The prediction accuracy of species abundance was lower than metabolite concentrations, presumably due to the limited number of training set observations of each species (Figure 5—figure supplement 2).

We used a gradient-based sensitivity analysis of the LSTM model M3 to provide biological insights into the contributions of individual species based on the temporal changes in metabolite concentrations (Figure 5h and j, Methods). This method involves computing partial derivatives of output variables of interest with respect to input variables, which are readily available through a single backpropagation pass (LeCun et al., 1988; Peurifoy et al., 2018). As an example case, we applied this analysis approach to the full 25-species community, which was grouped into Cluster 4, with the design rule ‘AC+’ (Figure 5a). Consistent with this design rule, we observed strong sensitivity gradients between the abundance of AC and the concentrations of butyrate, acetate, and lactate, consistent with our biological understanding of the system (Clark et al., 2021). Beyond the ‘AC+’ design rule, there was a strong sensitivity gradient between DL and acetate and succinate, consistent with the inferred networks based on the LSTM model M2 that used endpoint measurements (Figure 3d). Further, the contributions of certain species on metabolite production varied as a function of time. For instance, in the initial time point, species abundances were similar and thus the contribution of individual species to metabolite production was uniform. However, interactions between species during community assembly enhanced the contribution of specific metabolite driver species such as AC. In addition, the contributions of individual species such as PC and BA to succinate production peaked at 32 hr and then decreased by 48 hr, highlighting that the effects of these species were maximized at intermediate time points. In sum, the gradient-based method identified the quantitative contributions of each species to the temporal changes in metabolite concentrations for a representative 25-member community, identifying context-specific behaviors beyond the previously identified broader design rules. These two complementary approaches are useful for identifying design rules governing metabolite dynamics. The clustering method can identify broad design rules for species presence/absence and the LSTM analysis gradient approach can uncover fine-tuned quantitative contributions of species to the temporal changes in community-level functions.

To directly evaluate the performance of the gLV and LSTM model, we trained a discretized version of the gLV model (approximate gLV model) on the same dataset and used the same algorithm as the LSTM. The approximate gLV model was augmented with a two layer feed-forward neural network with a hidden dimension equivalent to the hidden dimension used in the LSTM model to enable metabolite predictions (Figure 5—figure supplement 3a, b). The approximate gLV model enables the computation of gradients via the backpropagation algorithm, which is also used to train the LSTM. By contrast, computation of gradients of the continuous-time gLV model requires numerical integration. This approximate gLV model does not perform as well as the LSTM model at species abundance predictions using the same data used to train LSTM model M3 (Figure 5—figure supplement 3c, b). In addition, the LSTM outperforms the approximate gLV augmented with the feed-forward network at metabolite predictions (Figure 5—figure supplement 3e, f). In sum, the LSTM outperforms the discretized gLV model using the same training algorithm, highlighting the power of the LSTM model in accurately predicting the temporal changes in microbiome composition and metabolite concentrations.

Discussion

The LSTM modeling framework trained on species abundance and metabolite concentrations accurately predicted multiple health-relevant functions of complex synthetic human gut communities. This model is powerful for designing communities with target metabolite profiles. Microbial communities continuously impact metabolites by releasing or consuming them. Therefore, by modeling both microbial growth and the metabolites they produce/consume together, the LSTM captured the interconnections between these variables. Due to its flexibility, the LSTM model outperforms the widely used gLV model in the presence of higher-order interactions. We leveraged the computational efficiency of LSTM model to predict the metabolite profiles of tens of millions of communities. We used these model predictions to identify sparsely represented ‘corner case’ communities that maximized/minimized community-level production of four health-relevant metabolites. In the absence of a predictive model, these infrequent communities would have been difficult to discover among the vast metabolite profile landscape of possible communities.

Beyond the model’s predictive capabilities, we showed that biological information including significant microbe-metabolite and microbe-microbe interactions, can be extracted from LSTM models. These biological insights could enable the discovery of key species and interactions driving community functions of interest. Further, this could inform the design of microbial communities from the bottom-up or interventions to manipulate community-level behaviors. For example, the inferred microbe-metabolite network highlighted AC is a major ecological driver of several metabolites including butyrate, acetate and lactate in our system. In addition, this microbe-metabolite network did not include species of the highly abundant genus Bacteroides but instead featured members of Firmicutes (AC, ER, DL), Actinobacteria (BA, BP, EL), Proteobacteria DP and Bacteroidetes PC. Notably, Bacteroides displayed numerous interactions in the microbe-microbe interaction network, suggesting that they played a key role in the growth of constituent community members opposed to production of specific metabolites. Therefore, our model suggests that Bacteroides influence broad ecosystem functions such as community growth dynamics whereas species highlighted in the microbe-metabolite network contribute to specialized functions such as the production of specific measured metabolites (Rivett and Bell, 2018). Therefore, the microbe-metabolite interaction network could be used to identify key species that could be targeted for manipulating the dynamics of specific metabolites.

We performed time-resolved measurements of metabolite production and species abundance using a set of designed communities and demonstrated that communities tend towards a typical dynamic behavior (i.e. Clusters 4 and 5). Therefore, random sampling of sub-communities from the 25-member system would likely exhibit behaviors similar to Clusters 4 and 5. We used the LSTM model to identify ‘corner cases’ communities that displayed metabolite concentrations near the tails of the metabolite distributions at the endpoint. The model allowed us to identify unique sub-clusters with disparate dynamic behaviors. We demonstrated that the endpoint model predictions were confirmatory (Figure 3c) and also led to new discoveries when additional measurements were made in the time dimension. Specifically, certain ‘corner cases’ communities identified based on prediction of a single time-point displayed distinct dynamic trajectories. For instance, Clusters 2 and 3 based on the decision tree classifier displayed similar end-point metabolite concentrations (Figure 5c, d). However, lactate decreased immediately over time in Cluster 2 communities but remained high until approximately 30 hr and then decreased in Cluster 3 communities. The design rule for Cluster 3 included the presence of lactate producers BU and DL (Figure 3—figure supplement 4), suggesting that these individual species’ lactate producing capabilities enabled the community to maintain a high lactate concentration for an extended period of time in the context of the Cluster 3 communities. While we focused on the production of four health-relevant metabolites produced by gut microbiota, a wide range of health-relevant compounds are produced by gut bacteria. Therefore, communities that cluster together based on dynamic trends in the four measured metabolites could separate into new clusters based on the temporal patterns of other compounds produced or degraded by the communities.

Time-resolved measurements were required to reveal the different dynamic behaviors of communities in Clusters 2 and 3 to improve our understanding and the design of community functions. The ability to resolve differences in the dynamic trajectories of communities requires time sampling when the system behavior is changing as a function of time as opposed to time sampling once the system has reached a steady-state (i.e. saturated as a function of time). The time to reach steady-state varied across different communities and metabolites of interest. For instance, lactate reached steady-state at an earlier time point (12 hr) in Cluster 4 communities whereas communities in Cluster 3 approached steady-state at a later time point (48 hr). Therefore, model-guided experimental planning could be used to identify the optimal sampling times to resolve differences in community dynamic behaviors. Achieving a highly predictive LSTM model required substantially less training data than a previous study that approximated the behavior of mechanistic biological systems models with RNNs (Figure 2; Wang et al., 2019). While the performance of any data-driven algorithm improves with the quantity and quality of available data, we demonstrate that the LSTM can translate learning on lower-order communities to accurately predict the behavior of higher-order communities given a limited and informative training set that is experimentally feasible. For synthetic microbial communities, the quality of the training set depends on the frequency of time-series measurements within periods in which the system displays rich dynamic behaviors (i.e. excitation of the dynamic modes of the system), the range of initial species richness, representation of each community member in the training data and sufficient variation in species abundances or metabolite concentrations (Figure 4—figure supplement 1). The dynamic behaviors of the synthetic communities characterized in vitro may likely exhibit significant differences to their behaviors in new environments such as the mammalian gut. However, communities in sub-clusters whose behaviors deviated substantially from the typical community behaviors (e.g. Clusters 2 and 3 versus Clusters 4 and 5) may be more likely than random to display unique dynamic behaviors in vivo. Future work will investigate whether the in vitro dynamic behavior cluster patterns can be used as prior information to guide the design of informative communities in new environments for building predictive models.

The current implementation of the LSTM model lacks uncertainty quantification for individual predictions, which could be used to guide experimental design (Radivojević et al., 2020). Recent progress in using Bayesian recurrent neural networks has led to emergence of Bayesian LSTMs (Fortunato et al., 2017; Li et al., 2021), which provides uncertainty quantification for each prediction in the form of posterior variance or posterior confidence interval. However, currently, the implementation and training of such Bayesian neural networks can be significantly more difficult than training the LSTM model developed here. In addition, we benchmarked the performance of the LSTM against a widely used gLV model which has been demonstrated to accurately predict community assembly in communities with up to 25 species (Venturelli et al., 2018; Clark et al., 2021). The gLV model has been modified mathematically to capture more complex system behaviors (McPeek, 2017). However, implementation of these gLV models to represent the behaviors of microbiomes with a large number of interacting species poses major computational challenges.

While our current approach treated microbiome species composition as the sole set of design variables in a constant environmental background, microbiomes in reality are impacted by differences in the physicochemical composition of their environment (Thompson et al., 2017). Given sufficient observations of community behavior under varied environmental contexts (e.g. presence/absence of certain nutrients), our LSTM approach could be further leveraged to design complementary species and environmental compositions for desired microbiome functional dynamics. Further, we can leverage the wealth of biological information stored in the sequenced genomes of the constituent organisms. Integrating methods such as genome scale models (Magnúsdóttir et al., 2017) with our LSTM framework could leverage genomic information to enable predictions when the genomes of the organisms are varied (i.e. alternative strains of the same species with disparate metabolic capabilities). In this case, introducing variables representing the presence/absence of specific metabolic reactions would potentially enable the model to predict the impact of a species with a varied set of metabolic reactions on a given set of functions without new experimental observations. Integrating this information into the model could thus enable a mapping between genome information and community-level functions.

While previous approaches have used machine learning methods to predict microbiome functions based on microbiome species composition (Le et al., 2020; Larsen et al., 2012; Thompson et al., 2019), our approach is a major step forward in predicting the future temporal trajectory of microbiome functions from an initial species composition. The dynamic nature of our approach enables the design of optimal initial community compositions or interventions to steer a community to a desired future state. The flexibility of our approach to various time resolutions could be especially useful in scenarios where a microbiome may display undesired transients on the path from an initial state to a desired final state. For instance, in treatment of gut microbiome dysbiosis, it is important to ensure that any transient states of the microbiome are not harmful to the host (e.g. pathogen blooms or overproduction of toxic metabolites) as the system approaches a desired healthy state (Xiao et al., 2020). However, because predictions with increased time resolution require more data for model training, the ability of our approach to predict system behaviors based on initial and final observations is useful for scenarios where transient states may be less important, such as in bioprocesses where the concentration of products at the time of harvest is the primary design objective (Yenkie et al., 2016). Finally, the computational efficiency and accuracy of the LSTM model could be exploited in the future for autonomous design and optimization of multifunctional communities via computer-controlled design-test-learn cycles (King et al., 2009).

Materials and methods

Key resources table
Reagent type (species) or resourceDesignationSource or referenceIdentifiersAdditional information
Strain, strain background (Prevotella copri CB7)PCDSM 18205
Strain, strain background (Parabacteroides johnsonii M-165)PJDSM 18315
Strain, strain background (Bacteroides vulgatus NCTC 11154)BVATCC 8482
Strain, strain background (Bacteroides fragilis EN-2)BFDSM 2151
Strain, strain background (Bacteroides ovatus NCTC 11153)BOATCC 8483
Strain, strain background (Bacteroides thetaiotaomicron VPI 5482)BTATCC 29148
Strain, strain background (Bacteroides caccae VPI 3452 A)BCATCC 43185
Strain, strain background (Bacteroides cellulosilyticus CRE21)BYDSMZ 14838
Strain, strain background (Bacteroides uniformis VPI 0061)BUDSM 6597
Strain, strain background (Desulfovibrio piger VPI C3-23)DPATCC 29098
Strain, strain background (Bifidobacterium longum subs. infantis S12)BLDSM 20088
Strain, strain background (Bifidobacterium adolescentis E194a (Variant a))BAATCC 15703
Strain, strain background (Bifidobacterium pseudocatenulatum B1279)BPDSM 20438
Strain, strain background (Collinsella aerofaciens VPI 1003)CADSM 3979
Strain, strain background (Eggerthella lenta 1899 B)ELDSM 2243
Strain, strain background (Faecalibacterium prausnitzii A2-165)FPDSM 17677
Strain, strain background (Clostridium hiranonis T0-931)CHDSM 13275
Strain, strain background (Anaerostipes caccae L1-92)ACDSM 14662
Strain, strain background (Blautia hydrogenotrophica S5a33)BHDSM 10507
Strain, strain background (Clostridium asparagiforme N6)CGDSM 15981
Strain, strain background (Eubacterium rectale VPI 0990)ERATCC 33656
Strain, strain background (Roseburia intestinalis L1-82)RIDSM 14610
Strain, strain background (Coprococcus comes VPI CI-38)CCATCC 27758
Strain, strain background (Dorea longicatena 111–35)DLDSMZ 13814
Strain, strain background (Dorea formicigenerans VPI C8-13)DFDSM 3992
Sequence-based reagentForward Primer Index: ATCACGIDTAATGATACGGCGACCACCGAGATCTACAC ATCACG ACACTCTTTCCCTACACGACGCTCTTCCGATCT ACTCCTACGGGAGGCAGCAGT
Sequence-based reagentForward Primer Index: CGATGTIDTAATGATACGGCGACCACCGAGATCTACAC CGATGT ACACTCTTTCCCTACACGACGCTCTTCCGATCT T ACTCCTACGGGAGGCAGCAGT
Sequence-based reagentForward Primer Index: TTAGGCIDTAATGATACGGCGACCACCGAGATCTACAC TTAGGC ACACTCTTTCCCTACACGACGCTCTTCCGATCT GT ACTCCTACGGGAGGCAGCAGT
Sequence-based reagentForward Primer Index: TGACCAIDTAATGATACGGCGACCACCGAGATCTACAC TGACCA ACACTCTTTCCCTACACGACGCTCTTCCGATCT CGA ACTCCTACGGGAGGCAGCAGT
Sequence-based reagentForward Primer Index: ACAGTGIDTAATGATACGGCGACCACCGAGATCTACAC ACAGTG ACACTCTTTCCCTACACGACGCTCTTCCGATCT ATGA ACTCCTACGGGAGGCAGCAGT
Sequence-based reagentForward Primer Index: GCCAATIDTAATGATACGGCGACCACCGAGATCTACAC GCCAAT ACACTCTTTCCCTACACGACGCTCTTCCGATCT TGCGA ACTCCTACGGGAGGCAGCAGT
Sequence-based reagentForward Primer Index: CAGATCIDTAATGATACGGCGACCACCGAGATCTACAC CAGATC ACACTCTTTCCCTACACGACGCTCTTCCGATCT GAGTGG ACTCCTACGGGAGGCAGCAGT
Sequence-based reagentForward Primer Index: ACTTGAIDTAATGATACGGCGACCACCGAGATCTACAC ACTTGA ACACTCTTTCCCTACACGACGCTCTTCCGATCT CCTGGAG ACTCCTACGGGAGGCAGCAGT
Sequence-based reagentForward Primer Index: GATCAGIDTAATGATACGGCGACCACCGAGATCTACAC GATCAG ACACTCTTTCCCTACACGACGCTCTTCCGATCT ACTCCTACGGGAGGCAGCAGT
Sequence-based reagentForward Primer Index: TAGCTTIDTAATGATACGGCGACCACCGAGATCTACAC TAGCTT ACACTCTTTCCCTACACGACGCTCTTCCGATCT T ACTCCTACGGGAGGCAGCAGT
Sequence-based reagentForward Primer Index: GGCTACIDTAATGATACGGCGACCACCGAGATCTACAC GGCTAC ACACTCTTTCCCTACACGACGCTCTTCCGATCT GT ACTCCTACGGGAGGCAGCAGT
Sequence-based reagentForward Primer Index: CTTGTAIDTAATGATACGGCGACCACCGAGATCTACAC CTTGTA ACACTCTTTCCCTACACGACGCTCTTCCGATCT CGA ACTCCTACGGGAGGCAGCAGT
Sequence-based reagentForward Primer Index: AGTCAAIDTAATGATACGGCGACCACCGAGATCTACAC AGTCAA ACACTCTTTCCCTACACGACGCTCTTCCGATCT ATGA ACTCCTACGGGAGGCAGCAGT
Sequence-based reagentForward Primer Index: AGTTCCIDTAATGATACGGCGACCACCGAGATCTACAC AGTTCC ACACTCTTTCCCTACACGACGCTCTTCCGATCT TGCGA ACTCCTACGGGAGGCAGCAGT
Sequence-based reagentForward Primer Index: ATGTCAIDTAATGATACGGCGACCACCGAGATCTACAC ATGTCA ACACTCTTTCCCTACACGACGCTCTTCCGATCT GAGTGG ACTCCTACGGGAGGCAGCAGT
Sequence-based reagentForward Primer Index: CCGTCCIDTAATGATACGGCGACCACCGAGATCTACAC CCGTCC ACACTCTTTCCCTACACGACGCTCTTCCGATCT CCTGGAG ACTCCTACGGGAGGCAGCAGT
Sequence-based reagentForward Primer Index: GTAGAGIDTAATGATACGGCGACCACCGAGATCTACAC GTAGAG ACACTCTTTCCCTACACGACGCTCTTCCGATCT ACTCCTACGGGAGGCAGCAGT
Sequence-based reagentForward Primer Index: GTCCGCIDTAATGATACGGCGACCACCGAGATCTACAC GTCCGC ACACTCTTTCCCTACACGACGCTCTTCCGATCT T ACTCCTACGGGAGGCAGCAGT
Sequence-based reagentForward Primer Index: GTGAAAIDTAATGATACGGCGACCACCGAGATCTACAC GTGAAA ACACTCTTTCCCTACACGACGCTCTTCCGATCT GT ACTCCTACGGGAGGCAGCAGT
Sequence-based reagentForward Primer Index: GTGGCCIDTAATGATACGGCGACCACCGAGATCTACAC GTGGCC ACACTCTTTCCCTACACGACGCTCTTCCGATCT CGA ACTCCTACGGGAGGCAGCAGT
Sequence-based reagentForward Primer Index: GTTTCGIDTAATGATACGGCGACCACCGAGATCTACAC GTTTCG ACACTCTTTCCCTACACGACGCTCTTCCGATCT ATGA ACTCCTACGGGAGGCAGCAGT
Sequence-based reagentForward Primer Index: CGTACGIDTAATGATACGGCGACCACCGAGATCTACAC CGTACG ACACTCTTTCCCTACACGACGCTCTTCCGATCT TGCGA ACTCCTACGGGAGGCAGCAGT
Sequence-based reagentForward Primer Index: GAGTGGIDTAATGATACGGCGACCACCGAGATCTACAC GAGTGG ACACTCTTTCCCTACACGACGCTCTTCCGATCT GAGTGG ACTCCTACGGGAGGCAGCAGT
Sequence-based reagentForward Primer Index: GGTAGCIDTAATGATACGGCGACCACCGAGATCTACAC GGTAGC ACACTCTTTCCCTACACGACGCTCTTCCGATCT CCTGGAG ACTCCTACGGGAGGCAGCAGT
Sequence-based reagentForward Primer Index: ACTGATIDTAATGATACGGCGACCACCGAGATCTACAC ACTGAT ACACTCTTTCCCTACACGACGCTCTTCCGATCT ACTCCTACGGGAGGCAGCAGT
Sequence-based reagentForward Primer Index: ATGAGCIDTAATGATACGGCGACCACCGAGATCTACAC ATGAGC ACACTCTTTCCCTACACGACGCTCTTCCGATCT T ACTCCTACGGGAGGCAGCAGT
Sequence-based reagentForward Primer Index: ATTCCTIDTAATGATACGGCGACCACCGAGATCTACAC ATTCCT ACACTCTTTCCCTACACGACGCTCTTCCGATCT GT ACTCCTACGGGAGGCAGCAGT
Sequence-based reagentForward Primer Index: CAAAAGIDTAATGATACGGCGACCACCGAGATCTACAC CAAAAG ACACTCTTTCCCTACACGACGCTCTTCCGATCT CGA ACTCCTACGGGAGGCAGCAGT
Sequence-based reagentForward Primer Index: CAACTAIDTAATGATACGGCGACCACCGAGATCTACAC CAACTA ACACTCTTTCCCTACACGACGCTCTTCCGATCT ATGA ACTCCTACGGGAGGCAGCAGT
Sequence-based reagentForward Primer Index: CACCGGIDTAATGATACGGCGACCACCGAGATCTACAC CACCGG ACACTCTTTCCCTACACGACGCTCTTCCGATCT TGCGA ACTCCTACGGGAGGCAGCAGT
Sequence-based reagentForward Primer Index: CACGATIDTAATGATACGGCGACCACCGAGATCTACAC CACGAT ACACTCTTTCCCTACACGACGCTCTTCCGATCT GAGTGG ACTCCTACGGGAGGCAGCAGT
Sequence-based reagentForward Primer Index: CACTCAIDTAATGATACGGCGACCACCGAGATCTACAC CACTCA ACACTCTTTCCCTACACGACGCTCTTCCGATCT CCTGGAG ACTCCTACGGGAGGCAGCAGT
Sequence-based reagentForward Primer Index: CAGGCGIDTAATGATACGGCGACCACCGAGATCTACAC CAGGCG ACACTCTTTCCCTACACGACGCTCTTCCGATCT ACTCCTACGGGAGGCAGCAGT
Sequence-based reagentForward Primer Index: CATGGCIDTAATGATACGGCGACCACCGAGATCTACAC CATGGC ACACTCTTTCCCTACACGACGCTCTTCCGATCT T ACTCCTACGGGAGGCAGCAGT
Sequence-based reagentForward Primer Index: CATTTTIDTAATGATACGGCGACCACCGAGATCTACAC CATTTT ACACTCTTTCCCTACACGACGCTCTTCCGATCT GT ACTCCTACGGGAGGCAGCAGT
Sequence-based reagentForward Primer Index: CCAACAIDTAATGATACGGCGACCACCGAGATCTACAC CCAACA ACACTCTTTCCCTACACGACGCTCTTCCGATCT CGA ACTCCTACGGGAGGCAGCAGT
Sequence-based reagentForward Primer Index: CGGAATIDTAATGATACGGCGACCACCGAGATCTACAC CGGAAT ACACTCTTTCCCTACACGACGCTCTTCCGATCT ATGA ACTCCTACGGGAGGCAGCAGT
Sequence-based reagentForward Primer Index: CTAGCTIDTAATGATACGGCGACCACCGAGATCTACAC CTAGCT ACACTCTTTCCCTACACGACGCTCTTCCGATCT TGCGA ACTCCTACGGGAGGCAGCAGT
Sequence-based reagentForward Primer Index: CTATACIDTAATGATACGGCGACCACCGAGATCTACAC CTATAC ACACTCTTTCCCTACACGACGCTCTTCCGATCT GAGTGG ACTCCTACGGGAGGCAGCAGT
Sequence-based reagentForward Primer Index: CTCAGAIDTAATGATACGGCGACCACCGAGATCTACAC CTCAGA ACACTCTTTCCCTACACGACGCTCTTCCGATCT CCTGGAG ACTCCTACGGGAGGCAGCAGT
Sequence-based reagentForward Primer Index: GACGACIDTAATGATACGGCGACCACCGAGATCTACAC GACGAC ACACTCTTTCCCTACACGACGCTCTTCCGATCT ACTCCTACGGGAGGCAGCAGT
Sequence-based reagentForward Primer Index: TAATCGIDTAATGATACGGCGACCACCGAGATCTACAC TAATCG ACACTCTTTCCCTACACGACGCTCTTCCGATCT T ACTCCTACGGGAGGCAGCAGT
Sequence-based reagentForward Primer Index: TACAGCIDTAATGATACGGCGACCACCGAGATCTACAC TACAGC ACACTCTTTCCCTACACGACGCTCTTCCGATCT GT ACTCCTACGGGAGGCAGCAGT
Sequence-based reagentForward Primer Index: TATAATIDTAATGATACGGCGACCACCGAGATCTACAC TATAAT ACACTCTTTCCCTACACGACGCTCTTCCGATCT CGA ACTCCTACGGGAGGCAGCAGT
Sequence-based reagentForward Primer Index: TCATTCIDTAATGATACGGCGACCACCGAGATCTACAC TCATTC ACACTCTTTCCCTACACGACGCTCTTCCGATCT ATGA ACTCCTACGGGAGGCAGCAGT
Sequence-based reagentForward Primer Index: TCCCGAIDTAATGATACGGCGACCACCGAGATCTACAC TCCCGA ACACTCTTTCCCTACACGACGCTCTTCCGATCT TGCGA ACTCCTACGGGAGGCAGCAGT
Sequence-based reagentForward Primer Index: TCGAAGIDTAATGATACGGCGACCACCGAGATCTACAC TCGAAG ACACTCTTTCCCTACACGACGCTCTTCCGATCT GAGTGG ACTCCTACGGGAGGCAGCAGT
Sequence-based reagentForward Primer Index: TCGGCAIDTAATGATACGGCGACCACCGAGATCTACAC TCGGCA ACACTCTTTCCCTACACGACGCTCTTCCGATCT CCTGGAG ACTCCTACGGGAGGCAGCAGT
Sequence-based reagentReverse Primer Index: ATCACGAGIDTCAAGCAGAAGACGGCATACGAGAT ATCACGAG GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT ggactaccagggtatctaatcctgt
Sequence-based reagentReverse Primer Index: CGATGTTCIDTCAAGCAGAAGACGGCATACGAGAT CGATGTTC GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT A ggactaccagggtatctaatcctgt
Sequence-based reagentReverse Primer Index: TTAGGCGAIDTCAAGCAGAAGACGGCATACGAGAT TTAGGCGA GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT TC ggactaccagggtatctaatcctgt
Sequence-based reagentReverse Primer Index: TGACCAATIDTCAAGCAGAAGACGGCATACGAGAT TGACCAAT GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT CTA ggactaccagggtatctaatcctgt
Sequence-based reagentReverse Primer Index: ACAGTGCTIDTCAAGCAGAAGACGGCATACGAGAT ACAGTGCT GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT GATA ggactaccagggtatctaatcctgt
Sequence-based reagentReverse Primer Index: GCCAATGTIDTCAAGCAGAAGACGGCATACGAGAT GCCAATGT GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT ACTCA ggactaccagggtatctaatcctgt
Sequence-based reagentReverse Primer Index: CAGATCGAIDTCAAGCAGAAGACGGCATACGAGAT CAGATCGA GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT TTCTCT ggactaccagggtatctaatcctgt
Sequence-based reagentReverse Primer Index: ACTTGAAAIDTCAAGCAGAAGACGGCATACGAGAT ACTTGAAA GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT CACTTCT ggactaccagggtatctaatcctgt
Sequence-based reagentReverse Primer Index: GATCAGTGIDTCAAGCAGAAGACGGCATACGAGAT GATCAGTG GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT ggactaccagggtatctaatcctgt
Sequence-based reagentReverse Primer Index: TCTACCTCIDTCAAGCAGAAGACGGCATACGAGAT TCTACCTC GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT A ggactaccagggtatctaatcctgt
Sequence-based reagentReverse Primer Index: CTTGTATGIDTCAAGCAGAAGACGGCATACGAGAT CTTGTATG GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT TC ggactaccagggtatctaatcctgt
Sequence-based reagentReverse Primer Index: TAGCTTCCIDTCAAGCAGAAGACGGCATACGAGAT TAGCTTCC GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT CTA ggactaccagggtatctaatcctgt
Sequence-based reagentReverse Primer Index: GGCTACCAIDTCAAGCAGAAGACGGCATACGAGAT GGCTACCA GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT GATA ggactaccagggtatctaatcctgt
Sequence-based reagentReverse Primer Index: ATGCACTTIDTCAAGCAGAAGACGGCATACGAGAT ATGCACTT GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT ACTCA ggactaccagggtatctaatcctgt
Sequence-based reagentReverse Primer Index: GACGGAACIDTCAAGCAGAAGACGGCATACGAGAT GACGGAAC GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT TTCTCT ggactaccagggtatctaatcctgt
Sequence-based reagentReverse Primer Index: AGCCTTGGIDTCAAGCAGAAGACGGCATACGAGAT AGCCTTGG GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT CACTTCT ggactaccagggtatctaatcctgt
Sequence-based reagentReverse Primer Index: CCGTAGAGIDTCAAGCAGAAGACGGCATACGAGAT CCGTAGAG GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT ggactaccagggtatctaatcctgt
Sequence-based reagentReverse Primer Index: GTGAGACTIDTCAAGCAGAAGACGGCATACGAGAT GTGAGACT GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT A ggactaccagggtatctaatcctgt
Sequence-based reagentReverse Primer Index: AATGCTCAIDTCAAGCAGAAGACGGCATACGAGAT AATGCTCA GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT TC ggactaccagggtatctaatcctgt
Sequence-based reagentReverse Primer Index: GCATCGTAIDTCAAGCAGAAGACGGCATACGAGAT GCATCGTA GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT CTA ggactaccagggtatctaatcctgt
Sequence-based reagentReverse Primer Index: CGAACAGCIDTCAAGCAGAAGACGGCATACGAGAT CGAACAGC GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT GATA ggactaccagggtatctaatcctgt
Sequence-based reagentReverse Primer Index: TCGGAAGGIDTCAAGCAGAAGACGGCATACGAGAT TCGGAAGG GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT ACTCA ggactaccagggtatctaatcctgt
Sequence-based reagentReverse Primer Index: TTCTGTCGIDTCAAGCAGAAGACGGCATACGAGAT TTCTGTCG GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT TTCTCT ggactaccagggtatctaatcctgt
Sequence-based reagentReverse Primer Index: GTACTCACIDTCAAGCAGAAGACGGCATACGAGAT GTACTCAC GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT CACTTCT ggactaccagggtatctaatcctgt
Sequence-based reagentReverse Primer Index: AGTAATACIDTCAAGCAGAAGACGGCATACGAGAT AGTAATAC GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT ggactaccagggtatctaatcctgt
Sequence-based reagentReverse Primer Index: CAAGATATIDTCAAGCAGAAGACGGCATACGAGAT CAAGATAT GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT A ggactaccagggtatctaatcctgt
Sequence-based reagentReverse Primer Index: TGTTTGGTIDTCAAGCAGAAGACGGCATACGAGAT TGTTTGGT GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT TC ggactaccagggtatctaatcctgt
Sequence-based reagentReverse Primer Index: CTCCAACCIDTCAAGCAGAAGACGGCATACGAGAT CTCCAACC GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT CTA ggactaccagggtatctaatcctgt
Sequence-based reagentReverse Primer Index: AAATTCTGIDTCAAGCAGAAGACGGCATACGAGAT AAATTCTG GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT GATA ggactaccagggtatctaatcctgt
Sequence-based reagentReverse Primer Index: CCCGCCAAIDTCAAGCAGAAGACGGCATACGAGAT CCCGCCAA GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT ACTCA ggactaccagggtatctaatcctgt
Sequence-based reagentReverse Primer Index: TACAAATAIDTCAAGCAGAAGACGGCATACGAGAT TACAAATA GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT TTCTCT ggactaccagggtatctaatcctgt
Sequence-based reagentReverse Primer Index: GGGCTATAIDTCAAGCAGAAGACGGCATACGAGAT GGGCTATA GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT CACTTCT ggactaccagggtatctaatcctgt
Sequence-based reagentReverse Primer Index: TTTCGGACIDTCAAGCAGAAGACGGCATACGAGAT TTTCGGAC GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT ggactaccagggtatctaatcctgt
Sequence-based reagentReverse Primer Index: TGCGCGTCIDTCAAGCAGAAGACGGCATACGAGAT TGCGCGTC GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT A ggactaccagggtatctaatcctgt
Sequence-based reagentReverse Primer Index: TCCCGCTGIDTCAAGCAGAAGACGGCATACGAGAT TCCCGCTG GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT TC ggactaccagggtatctaatcctgt
Sequence-based reagentReverse Primer Index: GTTTCAGGIDTCAAGCAGAAGACGGCATACGAGAT GTTTCAGG GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT CTA ggactaccagggtatctaatcctgt
Sequence-based reagentReverse Primer Index: GGAGGGGGIDTCAAGCAGAAGACGGCATACGAGAT GGAGGGGG GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT GATA ggactaccagggtatctaatcctgt
Sequence-based reagentReverse Primer Index: GCTGTTAGIDTCAAGCAGAAGACGGCATACGAGAT GCTGTTAG GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT ACTCA ggactaccagggtatctaatcctgt
Sequence-based reagentReverse Primer Index: GAGTGTGAIDTCAAGCAGAAGACGGCATACGAGAT GAGTGTGA GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT TTCTCT ggactaccagggtatctaatcctgt
Sequence-based reagentReverse Primer Index: CGTCCCCGIDTCAAGCAGAAGACGGCATACGAGAT CGTCCCCG GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT CACTTCT ggactaccagggtatctaatcctgt
Sequence-based reagentReverse Primer Index: CCTCATCAIDTCAAGCAGAAGACGGCATACGAGAT CCTCATCA GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT ggactaccagggtatctaatcctgt
Sequence-based reagentReverse Primer Index: CCACGACAIDTCAAGCAGAAGACGGCATACGAGAT CCACGACA GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT A ggactaccagggtatctaatcctgt
Sequence-based reagentReverse Primer Index: CATTGGCTIDTCAAGCAGAAGACGGCATACGAGAT CATTGGCT GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT TC ggactaccagggtatctaatcctgt
Sequence-based reagentReverse Primer Index: AGGGGCCCIDTCAAGCAGAAGACGGCATACGAGAT AGGGGCCC GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT CTA ggactaccagggtatctaatcctgt
Sequence-based reagentReverse Primer Index: ACGACACTIDTCAAGCAGAAGACGGCATACGAGAT ACGACACT GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT GATA ggactaccagggtatctaatcctgt
Sequence-based reagentReverse Primer Index: ACCGACGCIDTCAAGCAGAAGACGGCATACGAGAT ACCGACGC GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT ACTCA ggactaccagggtatctaatcctgt
Sequence-based reagentReverse Primer Index: TATAGTATIDTCAAGCAGAAGACGGCATACGAGAT TATAGTAT GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT TTCTCT ggactaccagggtatctaatcctgt
Sequence-based reagentReverse Primer Index: AACTCAGTIDTCAAGCAGAAGACGGCATACGAGAT AACTCAGT GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT CACTTCT ggactaccagggtatctaatcctgt

Experimental methods

Strain maintenance and culturing

All anaerobic culturing was carried out in an anaerobic chamber with an atmosphere of 2.5 ± 0.5% H2, 15±1% CO2 and balance N2. All prepared media and materials were placed in the chamber at least overnight before use to equilibrate with the chamber atmosphere. The strains used in this work were obtained from the sources listed in our previous publication (Clark et al., 2021) and permanent stocks of each were stored in 25% glycerol at -80°C. Batches of single-use glycerol stocks were produced for each strain by first growing a culture from the permanent stock in anaerobic basal broth (ABB) media (Oxoid) to stationary phase, mixing the culture in an equal volume of 50% glycerol, and aliquoting 400µL into Matrix Tubes (ThermoFisher) for storage at -80°C. Quality control for each batch of single-use glycerol stocks included (1) plating a sample of the aliquoted mixture onto LB media (Sigma-Aldrich) for incubation at 37°C in ambient air to detect aerobic contaminants and (2) Illumina sequencing of 16 S rDNA isolated from pellets of the aliquoted mixture to verify the identity of the organism. For each experiment, precultures of each species were prepared by thawing a single-use glycerol stock and combining the inoculation volume and media as described in Clark et al., 2021 to a total volume of 5 mL (multiple tubes inoculated if more preculture volume needed). Cultures were incubated until stationary phase at 37°C using the preculture incubation times described in Clark et al., 2021. All experiments were performed in a chemically defined medium (DM38), as previously described (Clark et al., 2021). This medium supports the individual growth of all organisms except Faecalibacterium prausnitzii (Clark et al., 2021).

Community culturing experiments and sample collection

Synthetic communities were assembled using liquid handling-based automation as described previously (Clark et al., 2021). Briefly, each species’ preculture was diluted to an OD600 of 0.0066 in DM38. Community combinations were arrayed in 96 deep well (96DW) plates by pipetting equal volumes of each species’ diluted preculture into the appropriate wells using a Tecan Evo Liquid Handling Robot inside an anaerobic chamber. For experiments with multiple time points, duplicate 96DW plates were prepared for each time point. Each 96DW plate was covered with a semi-permeable membrane (Diversified Biotech) and incubated at 37°C. After the specified time had passed, 96DW plates were removed from the incubator and samples were mixed by pipette. Cell density was measured by pipetting 200µL of each sample into one 96-well microplate (96 W MP) and diluting 20 L of each sample into 180µL of PBS in another 96 W MP and measuring the OD600 of both plates (Tecan F200 Plate Reader). We selected the value that was within the linear range of the instrument for each sample. A total of 200µL of each sample was transferred to a new 96DW plate and pelleted by centrifugation at 2400xg for 10 min. A supernatant volume of 180µL was removed from each sample and transferred to a 96-well microplate for storage at -20°C and subsequent metabolite quantification by high performance liquid chromatography (HPLC). Cell pellets were stored at -80°C for subsequent genomic DNA extraction and 16 S rDNA library preparation for Illumina sequencing. 20µL of each supernatant was used to quantify pH using a phenol Red assay (Silverstein, 2012). Phenol red solution was diluted to 0.05% weight per volume in 0.9% w/v NaCl. Bacterial supernatant (20µL) was added to 180µ of phenol red solution in a 96 W MP, and absorbance was measured at 560 nm (Tecan Spark Plate Reader). A standard curve was produced by fitting the Henderson-Hasselbach equation to fresh media with a pH ranging between 3 and 11 measured using a standard electro-chemical pH probe (Mettler-Toledo). We used (1) to map the pH values to the absorbance measurements.

(1) pH=pKa+blog10(A-AminAmax-A)

The parameters b and pKa were determined using a linear regression between pH and the log term for the standards in the linear range of absorbance (pH between 5.2 and 11) with Amax representing the absorbance of the pH 11 standard, Amin denoting the absorbance of the pH 3 standard and A representing the absorbance of each condition.

HPLC quantification of organic acids

Butyrate, succinate, lactate, and acetate concentrations in culture supernatants were quantified as described previously (Clark et al., 2021). Supernatant samples were thawed in a room temperature water bath before addition of 2µL of H2SO4 to precipitate any components that might be incompatible with the running buffer. The samples were then centrifuged at 2400xg for 10 min and then 150µL of each sample was filtered through a 0.2µm filter using a vacuum manifold before transferring 70µL of each sample to an HPLC vial. HPLC analysis was performed using a Shimadzu HPLC system equipped with a SPD-20AV UV detector (210 nm). Compounds were separated on a 250×4.6 mm Rezex OA-Organic acid LC column (Phenomenex Torrance, CA) run with a flow rate of 0.2 ml min-1 and at a column temperature of -50°C. The samples were held at 4°C prior to injection. Separation was isocratic with a mobile phase of HPLC grade water acidified with 0.015 N H2SO4 (415μLL-1). At least two standard sets were run along with each sample set. Standards were 100, 20, and 4 mM concentrations of butyrate, succinate, lactate, and acetate, respectively. The injection volume for both sample and standard was 25µL. The resultant data was analyzed using the Shimadzu LabSolutions software package.

Genomic DNA extraction and sequencing library preparation

Genomic DNA extraction and sequencing library preparation were performed as described previously (Clark et al., 2021). Genomic DNA was extracted from cell pellets using a modified version of the Qiagen DNeasy Blood and Tissue Kit protocol. First, pellets in 96DW plates were removed from -80°C and thawed in a room temperature water bath. Each pellet was resuspended in 180μL of enzymatic lysis buffer (20 mM Tris-HCl (Invitrogen), 2 mM Sodium EDTA (Sigma-Aldrich), 1.2% Triton X-100 (Sigma-Aldrich), 20 mg/mL Lysozyme from chicken egg white (Sigma-Aldrich)). Plates were then covered with a foil seal and incubated at 37°C for 30 min with orbital shaking at 600 RPM. Then, 25μL of 20mgmL-1 Proteinase K (VWR) and 200 L of Buffer AL (QIAGEN) were added to each sample before mixing with a pipette. Plates were then covered by a foil seal and incubated at 56°C for 30 min with orbital shaking at 600 RPM. Next, 200μL of 100% ethanol (Koptec) was added to each sample before mixing and samples were transferred to a Nucleic Acid Binding (NAB) plate (Pall) on a vacuum manifold with a 96DW collection plate. Each well in the NAB plate was then washed once with 500μL Buffer AW1 (QIAGEN) and once with 500μL of Buffer AW2 (QIAGEN). A vacuum was applied to the Pall NAB plate for an additional 10 min to remove any excess ethanol. Samples were then eluted into a clean 96DW plate from each well using 110μL of Buffer AE (QIAGEN) preheated to 56°C. Genomic DNA samples were stored at -20°C until further processing.

Genomic DNA concentrations were measured using a SYBR Green fluorescence assay and then normalized to a concentration of 1ngL-1 by diluting in molecular grade water using a Tecan Evo Liquid Handling Robot. First, genomic DNA samples were removed from -20°C and thawed in a room temperature water bath. Then, 1μL of each sample was combined with 95μL of SYBR Green (Invitrogen) diluted by a factor of 100 in TE Buffer (Integrated DNA Technologies) in a black 384-well microplate. This process was repeated with two replicates of each DNA standard with concentrations of 0, 0.5, 1, 2, 4, and 6ngL-1. Each sample was then measured for fluorescence with an excitation/emission of 485/535 nm using a Tecan Spark plate reader. Concentrations of each sample were calculated using the standard curve and a custom Python script was used to compute the dilution factors and write a worklist for the Tecan Evo Liquid Handling Robot to normalize each sample to 1ngL-1 in molecular grade water. Samples with DNA concentration less than 1ngL-1 were not diluted. Diluted genomic DNA samples were stored at -20°C until further processing.

Amplicon libraries were generated from diluted genomic DNA samples by PCR amplification of the V3-V4 of the 16 S rRNA gene using custom dual-indexed primers for multiplexed next generation amplicon sequencing on Illumina platforms (Clark et al., 2021). Primers were arrayed in skirted 96-well PCR plates (VWR) using an acoustic liquid handling robot (Labcyte Echo 550) such that each well received a different combination of one forward and one reverse primer (0.1μL of each). After liquid evaporated, dry primers were stored at -20°C. Primers were resuspended in 15μL PCR master mix (0.2μL Phusion High Fidelity DNA Polymerase [Thermo Scientific], 0.4μL 10 mM dNTP Solution [New England Biolabs], 4μL 5 x Phusion HF Buffer [Thermo Scientific], 4μL 5 M Betaine [Sigma-Aldrich], 6.4μL Water) and 5μL of normalized genomic DNA to give a final concentration of 0.05 M of each primer. Primer plates were sealed with Microplate B seals (Bio-Rad) and PCR was performed using a Bio-Rad C1000 Thermal Cycler with the following program: initial denaturation at 98°C (30 s); 25 cycles of denaturation at 98°C (10 s), annealing at 60°C (30 s), extension at 72°C (60 s); and final extension at 72°C (10 min). Of PCR products from each well, 2 µL were pooled and purified using the DNA Clean & Concentrator (Zymo) and eluted in water. The resulting libraries were sequenced on an Illumina MiSeq using a MiSeq Reagent Kit v3 (600-cycle) to generate 2 × 300 paired end reads.

Bioinformatic analysis for quantification of species abundance

Sequencing data were used to quantify species relative abundance as described previously (Clark et al., 2021). Sequencing data were demultiplexed using Basespace Sequencing Hub’s FastQ Generation program. Custom python scripts were used for further data processing as described previously (Clark et al., 2021). Paired end reads were merged using PEAR (v0.9.10) (Zhang et al., 2014) after which reads without forward and reverse annealing regions were filtered out. A reference database of the V3-V5 16 S rRNA gene sequences was created using consensus sequences from next-generation sequencing data or Sanger sequencing data of monospecies cultures. Sequences were mapped to the reference database using the mothur (v1.40.5) (Schloss et al., 2009) command classify.seqs (Wang method with a bootstrap cutoff value of 60). Relative abundance was calculated as the read count mapped to each species divided by the total number of reads for each condition. Absolute abundance of each species was calculated by multiplying the relative abundance by the OD600 measurement for each sample. Samples were excluded from further analysis if .1% of the reads were assigned to a species not expected to be in the community (indicating contamination). We expect the precision of our measurements to drop rapidly for species representing <1% of the community due to limited sequencing depth. We typically sequenced on the order of 10,000 molecules of PCR amplified DNA of the 16 S rRNA gene per sample, so if a species is only represented by on the order of 10 of those molecules (0.1%), then a single sequencing read error would be a 10% error, whereas a species represented by (>100) reads (1%) would only have 1% error per read.

Choice of sample sizes

Sample sizes were chosen based on limitations of experimental throughput as increased number of biological replicates would have reduced the number of possible different communities that could be observed. We chose a minimum of two biological replicates (for complex communities in our validation set) and some sample types have up to seven biological replicates (such as the full community, which was repeated in most experiments as a control for consistency between experimental days).

Computational methods

Long short-term memory for dynamic prediction on microbial communities

Long short term memory (LSTM) networks belong to the class of recurrent neural networks (RNNs) and model time-series data. They were first introduced by Hochreiter et al. (Hochreiter and Schmidhuber, 1997) to overcome the vanishing or exploding gradients problem (Hochreiter, 1998) that occur due to long-term temporal dependencies. Since their inception, LSTMs have been further refined (Gers et al., 2000; Graves and Schmidhuber, 2005) and find numerous applications in several domains, including but not limited to neuroscience (Storrs and Kriegeskorte, 2019), weather forecasting (Karevan and Suykens, 2020), predictive finance (Fischer and Krauss, 2018), Google Voice for speech recognition (Esch, 2014; Fischer and Krauss, 2018) and Google Allo for message suggestion (Wei et al., 2018).

Similar to any recurrent neural network, an LSTM network comprises of a network of multiple LSTM units, each representing the input-output map at a time instant. Figure 2 shows the schematic of the proposed LSTM network architecture for abundance prediction. For a microbial community comprising of N species, each LSTM unit models the dynamics at time t using the following set of equations:

(2) it=σ(Wiixt+bii+Whiht1+bhi)ft=σ(Wifxt+bif+Whfht1+bhf)gt=tanh(Wigxt+big+Whght1+bhg)ot=σ(Wioxt+bio+Whoht1+bho)ct=ftct1+itgtht=ottanh (ct),

where ht, ct, xt are the hidden state, cell state and input abundance at time t, respectively, and it, ft, gt, ot are input, forget, cell and output gates, respectively. σ is the sigmoid function, and ⊙ denotes the Hadamard product. The parameters {Wmn,bmn} for m,n{f,g,h,i,o} are trainable and shared across all LSTM units. The output gate ot is further used to generate the abundance for next time instant as:

(3) yt:xt+1=Wyoot+byo.

As shown in Figure 2, yt is fed to the LSTM unit at the next timestep (t+1), which in turn predicts the species abundance at time t+2. The process is repeated across multiple LSTM units in order to obtain xtfinal. The entire architecture is trained to minimize the mean-squared loss between the predicted abundance xtfinal and true abundance x^tfinal.

Using teacher forcing for intermittent time-series forecasting (Figures 1, 2 and 5)

The end-goal for the proposed LSTM-network based abundance predictor is to accurately capture the steady-state (final) abundance from initial abundance. In typical LSTM networks, the output of the recurrent unit at the previous timestep yt-1 is used as an input to the recurrent unit at the current timestep xt. This kind of recurrent model, while has the ability to predict final abundance, is incapable of handling the one-step-ahead prediction. The problem is even more critical when one tries to anticipate more than a single timestep into the future. Teacher forcing (Benny Toomarian and Barhen, 1992) entails a training procedure for recurrent networks, such as LSTMs, where ‘true’ abundances at intermittent timesteps are used to guide (like a teacher) the model to accurately anticipate one-step-ahead abundance.

Teacher forcing is an efficient method of training RNN models that use the ground truth from a prior time step as input. This is achieved by occasionally replacing the predicted abundance yt-1 from the previous timestep with the true abundance x^t at the current timestep as input abundance to the LSTM unit at the current timestep during the training process. Teacher forcing not only stabilizes the training process, it forces the output abundances at all times to closely match the corresponding true abundances. This is precisely why we do not just use the ground truth abundances at intermittent timesteps in order to robustify the prediction of steady-state abundance. Once trained, the inference in such models is achieved by ignoring the ground truth abundances and using the predicted abundance from previous instant to roll forward the model in time. Teacher forcing was used to train the LSTM in all cases where intermediate time points were measured (Figures 1, 2 and 5), as opposed to cases that only included initial and final time points (Figures 3 and 4).

Metabolite profiling

Microbial communities are a rich source of a variety of metabolites that are very commonly used as nutritional supplements, natural compounds to cure infectious diseases and in sustainable agriculture development. The concentration and chemical diversities of metabolites produced in a microbial community is a direct consequence of the diversity of interactions between organisms in the community. In essence, the dynamical evolution of relative species abundance and intra-community interactions govern the nature and amount of metabolites produced in the community. The functional map between species abundance and concentration of metabolites is highly complex and nonlinear, and is often approximated using simple regressors involving unary and pairwise interaction terms. In this paper, we model the species-metabolite map through appropriate modification of the LSTM network.

The aforementioned LSTM network for predicting the species abundance is suitably modified to augment four additional components that correspond to the concentration of metabolites at each time instant. In particular, the species abundance data (of size Nspecies) is concatenated with the metabolite concentration data (of size Nmetabs) to form a (Nspecies+Nmetabs)-dimensional feature vector, which is suitably normalized so that the different components have zero mean and unity variance. The feature scaling is important to prevent over reliance on features with a broad range of values. Concatenation of species abundance data and the metabolite concentration data ensures that the future trajectory of metabolite concentrations evolves as a function of both the species abundance, as well as the metabolite concentrations at previous time instants. As before, the (Nspecies+Nmetabs)-dimensional output of each LSTM unit is fed into the input block of the subsequent LSTM unit in order to advance the model forward in time. The model predictions at each time point is then transformed back to the original scale in order to obtain the Pearson R2 scores on the unnormalized data. Compared with existing approaches that employ ordinary differential equations (ODEs) and multiple linear regression models for predicting metabolites, the proposed architecture enables more accurate and rapid estimation of all four metabolites. All the LSTM models were implemented in Python using PyTorch on an Intel i7-7700HQ CPU @2.80 GHz processor with 16 GB RAM and NVIDIA GeForce GTX 1060 (6GB GDDR5) GPU. The exact details of the neural network architecture consisting of number of layers, learning rate, choices of optimizer and nonlinear activations are described in Supplementary file 1.

Data preprocessing for LSTM networks

Data normalization is one of the most widely adopted practices for efficient training of neural networks. Data normalization is known to speed up the training leading to faster convergence. At the same time, when working with multi-modal data or data with features represented at multiple scales, it is recommended to normalize the features (also known as feature standardization) to the same scale in order to avoid over reliance on features with large magnitudes. A common choice for feature standardization is to have zero-mean and unit-variance for each feature in the data. Let xi(n)(k) and cj(n)(k) represent the abundance of the ith-species and concentration of the jth-metabolite at the kth time instant for the nth sample. The mean and standard deviation of the quantities {xi(n)(k)} and {cj(n)(k)} can then be computed over the training dataset defined by:

μxi(k)=1Nsamplesn=1Nsamplesxi(n)(k),μcj(k)=1Nsamplesn=1Nsamplescj(n)(k),σxi(k)=1Nsamplesn=1Nsamples(xi(n)(k)μxi(k))2,σcj(k)=1Nsamplesn=1Nsamples(cj(n)(k)μcj(k))2.

The quantities can then be standardized as:

x~i(n)(k):xi(n)(k)μxi(k)σxi(k),c~j(n)(k):cj(n)(k)μcj(k)σcj(k).

The process is repeated for all species and metabolites at each time-point, and the scaled inputs {x~i(n)(k)} and {c~j(n)(k)} are then fed to the LSTM neural networks for prediction. During inference on the test data, the normalized output of each LSTM unit is inversely transformed back to its original scale using the precomputed {μxi(k),σxi(k)} and {μcj(k),σcj(k)}. The readers are encouraged to refer to Goodfellow et al., 2016; Zheng and Casari, 2018 for additional details on feature standardization. It is also not uncommon to normalize the data with respect to the mean and standard deviation of the entire dataset (and not just with respect to the training dataset). However, in practical scenarios, the test data is not known a priori, and thus it is undesirable to employ statistics from the hold-out test data for feature standardization. In our implementation of data preprocessing for LSTM networks, we had employed feature standardization using (a) training data only, and (b) both training and test data. The predictive performance of our LSTM models was nearly identical for both of these feature standardization approaches. The feature standardization can be toggled by the normalize_all variable in our open-source implementation.

Hyperparameter tuning for LSTM networks

Similar to other learning algorithms, training an LSTM network entails choosing a set of hyperparameters for optimal performance. We used an exhaustive grid-search for hyperparameter optimization, while the choice of learning algorithm (optimizer) was restricted to Adam (Kingma and Ba, 2014) due to its superior empirical performance. For each experiment, nearly 10% of the training dataset was reserved as a cross-validation set, and the performances of the trained models were evaluated using cross-validation sets. This information was used to select the best hyperparameter settings. The choices for hyperparameters include: (a) learning rate, (b) number of hidden layers per LSTM unit, (c) number of units per layer within an LSTM unit, (d) mini-batch size, and (e) input data normalization. The input features are normalized to have zero mean and unit variance. The choices of learning rates include 0.005, 0.001, and 0.0001, respectively, each with a decay of 0.25 after every 25 epochs. The gradual decay in learning rates prevents potential overfitting to the data. An L2 regularization term with a very small weight decay coefficient (10-5) is augmented to the loss function for preventing further overfitting. Choices for mini-batch size included 1, 10, 20, and 50, respectively. It was observed that sizes 10 and 20 resulted in improved training loss, and hence we used mini-batch sizes of 10 or 20 in all evaluations. The number of hidden layers per LSTM cell was iterated from 1 to 2. A two-layered LSTM did not result in any noticeable improvement over a single-layered LSTM cell, and thus we restricted our focus to just a single-layered LSTM cell for the sake of simplicity and faster training/inference. Finally, we tried 512, 1024, 2048, and 4096 hidden units per LSTM cell, and depending upon the complexity of the problem, we used 2048 hidden units (predictions of species and no prediction of metabolites) or 4096 hidden units (simultaneous prediction of species and metabolites). The exact details on the number of training epochs, learning rates, decay rates for different experiments can be found in the Supplementary file 1.

Specific applications of computational methods

Comparison of gLV and LSTM in silico (Figure 1)

To compare the LSTM and gLV models, we used a ground truth model of a 25-species community of the form:

(4) dxi(t)dt=ri+j=1Nspeciesaijxj(t)+j=1jiNspeciesk=1ki,jNspeciesbijkxj(t)xk(t)xi(t),

where and represent individual species exponential growth rate and pairwise interaction coefficients, respectively. The parameters represent the effect of third-order interactions. The parameters and were derived from a gLV model in a previous study (Clark et al., 2021). We consider three types of simulation studies, each corresponding to varying contributions of the third-order interactions (second-order only: , mild third-order: uniformly sampling in the range , moderate third order: uniformly sampling in the range ). In each scenario, the ground truth data was generated for 624 training communities (25 monospecies, 300 two-member, 100 three-member, 100 five-member, and 99 six-member communities) by simulating the species abundance trajectories over the course of 48 hr and ‘sampling’ every 8 hr. The values from these ‘sampled’ time points were used to train both an LSTM model (methods described above in ‘Computational Methods’ with specific details in Supplementary file 1) and a standard gLV model (trained using FMINCON function in MATLAB as described previously [Clark et al., 2021]). These two models were then used to predict a set of 3299 hold-out communities >10 species simulated using the same ground truth model. For an additional analyses, the training data for the LSTM was augmented by including ground truth data for an additional 100 communities with 11 or 19 species.

LSTM training for experimental 12-species community (Figure 2)

The data used in this analysis consisted of 175 microbial community subsets of a 12-species community sampled every 12 hr for 60 hr (Venturelli et al., 2018). Of these communities, 102 were chosen randomly to constitute the training data and the remaining 73 constituted the hold-out set. The LSTM was trained as described above in ‘Computational Methods’ with specific details in Supplementary file 1.

Using LSTM Model to design multifunctional communities (Figure 3)

We used the LSTM model trained on previous data (Figure 3a) to design two sets of communities: a ‘distributed’ community set and a ‘corner’ community set. For the ‘distributed’ community set, we first took the predicted metabolite concentrations for all communities with .10 species and used k-means clustering with k=100 (Python 3, scikit-learn v0.23.1, sklearn.cluster.Kmeans function) to identify 100 cluster centroids that were distributed across all of the predictions. We then found the closest community to each centroid in terms of Euclidean distance in the four-dimensional metabolite concentration space. These 100 communities constituted the ‘distributed’ community set.

For the ‘corner’ community set, we first defined four ‘corners’ in the lactate and butyrate concentration space by binning all communities with .10 species as shown in Figure 3b:

  1. 5% lowest lactate concentration communities, then 5% lowest butyrate concentration of those

  2. 5% lowest lactate concentration communities, then 5% highest butyrate concentration of those

  3. 5% lowest butyrate concentration communities, then 5% lowest lactate concentration of those

  4. 5% lowest butyrate concentration communities, then 5% highest lactate concentration of those.

Within each of those four ‘corners’, we identified four ‘sub-corners’ in the acetate and succinate concentration space by binning communities as shown in Figure 3b:

  1. 5% lowest acetate concentration communities, then 5% lowest succinate concentration of those

  2. 5% lowest acetate concentration communities, then 5% highest succinate concentration of those

  3. 5% lowest succinate concentration communities, then 5% lowest acetate concentration of those

  4. 5% lowest succinate concentration communities, then 5% highest acetate concentration of those

This process resulted in 16 ‘sub-corners’ total. For each ‘sub-corner’, we then chose a random community and then identified four more communities that were maximally different from that community in terms of which species were present (Hamming distance). This overall process resulted in 80 communities constituting the ‘corner’ community set.

Composite model: gLV model for predicting species abundance (Figure 3)

To benchmark the performance of the LSTM model for predicting metabolite production, we used a previously described Composite Model consisting of a generalized Lotka-Volterra (gLV) model for predicting species abundance dynamics and a regression model with interaction terms to predict metabolite concentration at a given time from the species abundances at that time (Clark et al., 2021). Because our LSTM model was trained on the same dataset as Composite Model M3 from Clark et al., 2021, we used those gLV model parameters.

Composite model: regression models for predicting metabolite concentrations (Figure 3)

Our composite model implementation is similar to the model described in Clark et al., 2021 for predicting metabolite concentration from community composition at a particular time. We used the exact gLV model parameter distributions obtained by Clark et al., 2021, which were determined using an approach based on Shin et al., 2019. The regression model mapping endpoint species abundance to metabolite concentrations from Clark et al., 2021 was focused specifically on the prediction of butyrate. Therefore, we adapted the approach to prediction of multiple metabolites. First, we modified the model form to include first order and interaction terms for all 25 species, rather than just the butyrate producers. Then, we separately trained four regression models, one for each metabolite (butyrate, lactate, acetate, succinate), using the measured species abundance and measured metabolite concentrations from the same dataset used to train the LSTM model. We trained these models as described previously (Clark et al., 2021) by using Python scikit-learn (Pedregosa, 2011) and performed L1 regularization to minimize the number of nonzero parameters. Regularization coefficients were chosen by using 10-fold cross validation. We selected the regularization coefficient value with the lowest median mean-squared error across the training splits.

For predicting end-point metabolite profiles from initial species abundance using the LSTM network, a feed-forward network (FFN) was used at the output of the last LSTM unit to convert end-point species abundance to end-point metabolite concentrations. On the other hand, the composite model in Clark et al., 2021 uses multiple linear regressors at its output to predict a given metabolite concentration from species abundance. At this point, it is still unclear if the superior performance of LSTM network is due to the addition of a more powerful FFN at its output over the multiple linear regressors at the output of the composite model. Therefore, we replaced the simple multiple linear regressors component of the composite model from Clark et al., 2021 with a Random Forest regressor (Segal, 2004) or a FFN. However, neither of these additions improved the metabolite prediction accuracy beyond that of the LSTM with a FFN (Figure 3—figure supplement 3a).

Composite model: simulations for prediction (Figure 3)

Custom MATLAB scripts were used to predict community assembly using the gLV model as described previously (Clark et al., 2021). For each community, the growth dynamics were simulated using each parameter set from the posterior distribution of the gLV model parameters. The resulting community compositions for each simulation at 48 hr were used as an input to the regression models (multiple linear regression/Random Forest/FFN) implemented in Python to predict the concentration of each metabolite in each community for each gLV parameter set. Because of the large number of communities and the large number of parameter sets (i.e., hundreds of simulations per community), we used parallel computing (MATLAB parfor) to complete the simulations in a reasonable timeframe (∼1 hr for the communities in Figure 3—figure supplement 3a).

Understanding relationships between variables using LIME (Figure 3)

Black-box methods, such as the LSTM-networks employed in this manuscript, do not offer much insights into the underlying mechanics that make them so powerful. Consequently, any potential pitfalls that may come along with building such models remain unexplored. For networks that are of significant biological importance, basing assumptions on falsehoods can be catastrophic. We overcome this limitation by resorting to Local Interpretable Model-Agnostic Explanations (LIME) (Ribeiro et al., 2016a).

LIME has three key components: (a) Local, that is, any explanation reflects the behavior of a classifier around the sampled instance, (b) Interpretability, that is, the explanations offered by LIME are interpretable by human, (c) Model-Agnostic, that is, LIME does not require to peak into any model. It generates explanations by analyzing the model’s behavior for an input perturbed around its neighborhood. In this manuscript, we employ LIME to explain both qualitatively and quantitatively, as to how the abundances of various species affect the concentrations of all four metabolites, and if the presence or absence of a given species has any significance on the resulting metabolite profile.

We carried out the LIME analysis to generate interpretable prediction explanations for model M2 for each community instance used to train the model. We used lime v0.2.0.1 for Python 3 (https://github.com/marcotcr/lime; Ribeiro, 2021) to train an explainer on the predictions of the training instances for each output variable (25 species, 4 metabolites) and then generated explanation tables for every input variable (species presence/absence) for every training instance. We then determined the median value for which the presence of a given species explained the prediction for each output variable to generate the networks in Figure 3d, e.

In a separate analysis, we investigated the sensitivity of LIME explanations to the training data used to fit the LSTM model. Because the purpose of this analysis was to understand the variability in LIME explanations and not to understand the dependence of LIME explanations on different communities, we only considered LIME explanations of the full community. This is in contrast to the LIME explanations shown in Figure 3d, e, which present the median LIME explanation taken over all of the communities in the training data. Training data was varied using 20-fold cross-validation, and LIME sensitivity of both metabolites (Figure 3—figure supplement 5) and species (Figure 3—figure supplement 6) was computed after fitting the LSTM model to each partition of the training data.

Understanding relationships between variables using prediction sensitivity (Figure 4)

For each metabolite (Acetate, Butyrate, Lactate, Succinate), fractions of 0.5, 0.6, 0.7, 0.8, 0.9, and 1 of the total dataset were randomly sampled. Each sub-sampled dataset was subject to 20-fold cross validation to determine the sensitivity of held-out prediction performance to the amount of data available for training. This process was repeated 30 times, and the average prediction over the 30 trials was used to compute the final held-out prediction performance (R2).

The sensitivity of the model to the presence of individual species and pairs of species was determined by evaluating prediction performance (R2) for subsets of the data containing each species and each possible pair of species. To evaluate how prediction performance of each metabolite was affected by the presence of species pairs, we computed the average percent difference between prediction performance taken over subsets containing a single species and all pairs of species using the following equation:

(5) Pairwisesensitivity=100Nspecies2i=1NspeciesjiNspeciesRij2Ri2Ri2,

where Ri2 is the prediction performance taken over the subset of samples containing species i, and Rij2 is the prediction performance taken over the subset of samples containing species i and j.

Clustering metabolite trajectories (Figure 5)

To generate the clusters from the dynamic community observations (Figure 5), we used a graph-theoretic divisive clustering algorithm (Jain et al., 1999) based on the minimal spanning tree (Zahn, 1971). We first generated an undirected graph wherein each node was a community observed in our experiment and each edge weight was the Euclidean distance between two communities based on all metabolite measurements (4 metabolites ×3 time points = 12-dimensional space for Euclidean distance calculation). We then determined the minimal spanning tree for this graph using the minimum_spanning_tree function in networkx (v2.1) for Python 3. We then used this minimal spanning tree to generate clusters by iteratively removing the edge with the largest weight until 6 clusters were formed. In each iteration, if any edge removal resulted in a cluster with <5 communities (i.e. minimum cluster size), that edge was returned and the next largest edge was removed. The number of clusters and minimum cluster size were chosen based on an elbow method (Pal and Biswas, 1997), wherein scatter plots were made of the mean intracluster distance versus the number of clusters for various minimum cluster sizes and a combination of minimum cluster size and number of clusters that fell on the elbow of the plot was chosen.

Decision tree classification of metabolite trajectories (Figure 5)

The decision tree shown in Figure 5—figure supplement 1d and used to produce the annotations in Figure 5a was generated using the DecisionTreeClassifier with the default parameter settings in scikit-learn (v0.23.1) for Python 3 (visualization generated using plot_tree function from the same).

Understanding relationships between variables using sensitivity gradients (Figure 5)

Interpretability of neural-network (NN) models continues to be an interesting challenge in machine learning. While LIME is a great tool to explain what machine learning classifiers are doing, it is model-agnostic and uses simple linear models to approximate local behavior. Model-agnostic characteristic enforces retraining linear models on the training data and analyzing local perturbations, before LIME can be used to invoke interpretability. Moreover, the type of modifications that need to be performed on the data to get proper explanations are typically use case specific. Consequently, model-aware interpretability methods that take into account the weights of an already trained NN are more suitable.

For tasks, such as classification of images and videos, there is a natural way to interpret NN models using class activation maps (CAMs) (Selvaraju et al., 2017). CAMs assign appropriate weighting to different convolutional filters and highlights part of the images that activate a given output class the most. However, CAMs do not extend to other NN architectures, such as LSTMs. Fortunately for us, the answer to interpretability lies in the model training itself. Let Y be the output variable of interest whose perturbation with respect to an input x needs to be estimated. The effect of x on Y can be approximated through the partial derivative Yx. For instance, Y may denote butyrate concentration in an experiment, while x can be used to represent abundance of a given species. The sign of the partial derivative depicts positive (or negative) correlation between the two variables, while the magnitude represents the extent of it. In order to evaluate the partial derivatives, we freeze the weights of the already trained LSTM model and declare the inputs to be variables. A single backpropagation pass then evaluates the partial derivatives of an output variable of interest with respect to all the input variables. This is in contrast to LIME-based interpretability method, which requires training an additional model on top of an already trained deep learning model. Most deep learning libraries already implement a computational graph for performing efficient forward and backward passes during the training phase. This computational graph can be used to evaluate sensitivity gradients.

There indeed are other methods for explanation of neural networks, most notably the Shapley explainability method (Lundberg and Lee, 2017). This method is substantially more computationally burdensome than LIME or a sensitivity gradient based method (Jia, 2019a; Jia, 2019b). LIME and sensitivity gradients are based on first-order perturbations around the already learned model, and can be used to depict local model behavior with little to no computational burden. By contrast, explainability methods like Shapley are computationally expensive. An exact computation of Shapley values for a K-dimensional input requires estimating 2K possible coalitions of the feature values and the “absence” of a feature has to be simulated by drawing random instances, which increases the variance for the estimate of the Shapley values estimation.

Comparison of the discretized gLV model to the LSTM (Figure 5—figure supplement 3)

To train the gLV model using the same algorithm used to train the LSTM and enable metabolite prediction, the gLV model was discretized and augmented with a feed-forward neural network.The approximate gLV model is

xi(t+1)=xi(t)(ri+j=1Nspeciesaijxj(t)),ci(t+1)=FFN(xi(t+1)),

where xi is the abundance of species i, ri is the growth rate of species i, aij represents the impact of species j on species i, and ci is the concentration of metabolite i. Metabolite concentrations at time step t are predicted using a feed-forward neural network (FFN). The structure of the discretized gLV model requires that all species abundances are strictly non-negative. When training the gLV, the data were pre-processed such that each feature (species abundance and metabolite concentration) ranges between zero and one based on the maximum value of each feature in the training data, computed at each time step. This is in contrast to the scaling used for the LSTM, which results in negative values for transformed species abundances. The stochastic gradient descent algorithm was used to train the LSTM and the discretized gLV model, using the Adam (Kingma and Ba, 2014) optimizer. The default settings of the Pytorch function, ReduceLROnPlateau, were used to adjust the learning rate during training. Species growth rates, ri, and interaction coefficients, aij, were initialized to zero prior to fitting.

Data availability

Pytorch implementation of the proposed LSTM model and the accompanying measurements of community composition and metabolite concentrations are available from GitLab (https://gitlab.eecs.umich.edu/mayank.baranwal/Microbiome; copy archived at swh:1:rev:f2eed8f013e42fbcc8c8f98816a8146559d6f1b3). The raw Illumina sequencing data is available from Zenodo (https://zenodo.org/record/5529327, https://doi.org/10.5281/zenodo.5529327).

The following data sets were generated
    1. Clark R
    (2021) Zenodo
    NGS Data Accompanying "Deep Learning Enables Design of Multifunctional Synthetic Human Gut Microbiome Dynamics.
    https://doi.org/10.5281/zenodo.5529327

References

  1. Conference
    1. Byeon W
    2. Breuel TM
    3. Raue F
    4. Liwicki M
    (2015) Scene labeling with LSTM recurrent neural networks
    2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR. pp. 3547–3555.
    https://doi.org/10.1109/CVPR.2015.7298977
  2. Book
    1. Goodfellow I
    2. Bengio Y
    3. Courville A
    4. Bengio Y
    (2016)
    Deep Learning
    MIT press Cambridge.
    1. Hochreiter S
    (1998)
    Recurrent neural net learning and vanishing gradient
    International Journal Of Uncertainity, Fuzziness and Knowledge-Based Systems 6:107–116.
  3. Conference
    1. Jia R
    (2019a)
    Towards efficient data valuation based on the shapley value
    In The 22nd International Conference on Artificial Intelligence and Statistics. pp. 1167–1176.
  4. Conference
    1. LeCun Y
    2. Touresky D
    3. Hinton G
    4. Sejnowski T
    (1988)
    A theoretical framework for back-propagation
    In Proceedings of the 1988 connectionist models summer school. pp. 21–28.
  5. Conference
    1. Lundberg SM
    2. Lee SI
    (2017)
    A unified approach to interpreting model predictions
    In Proceedings of the 31st international conference on neural information processing systems. pp. 4768–4777.
    1. Pedregosa F
    (2011)
    Scikit-learn: Machine learning in python
    The Journal of Machine Learning Research 12:2825–2830.
  6. Conference
    1. Ribeiro MT
    2. Singh S
    3. Guestrin C
    (2016a) why should i trust you?” explaining the predictions of any classifier
    In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. pp. 1135–1144.
    https://doi.org/10.1145/2939672.2939778
  7. Report
    1. Segal MR
    (2004)
    Machine learning benchmarks and random forest regressione
    eScholarship.
  8. Conference
    1. Wei C
    2. Yu Z
    3. Fong S
    (2018) How to build a chatbot: chatbot framework and its capabilities
    In Proceedings of the 2018 10th International Conference on Machine Learning and Computing. pp. 369–373.
    https://doi.org/10.1145/3195106.3195169
  9. Book
    1. Zheng A
    2. Casari A
    (2018)
    Feature Engineering for Machine Learning: Principles and Techniques for Data Scientists
    O’Reilly Media, Inc.

Decision letter

  1. Sara Mitri
    Reviewing Editor; University of Lausanne, Switzerland
  2. Aleksandra M Walczak
    Senior Editor; CNRS LPENS, France
  3. Lingchong You
    Reviewer; Duke University, United States

Our editorial process produces two outputs: i) public reviews designed to be posted alongside the preprint for the benefit of readers; ii) feedback on the manuscript for the authors, including requests for revisions, shown below. We also include an acceptance summary that explains what the editors found interesting or important about the work.

Decision letter after peer review:

Thank you for submitting your article "Deep Learning Enables Design of Multifunctional Synthetic Human Gut Microbiome Dynamics" for consideration by eLife. Your article has been reviewed by 3 peer reviewers, and the evaluation has been overseen by a Reviewing Editor and Aleksandra Walczak as the Senior Editor. The following individual involved in review of your submission has agreed to reveal their identity: Lingchong You (Reviewer #2).

The reviewers have discussed their reviews with one another, and the Reviewing Editor has drafted this to help you prepare a revised submission.

Essential revisions:

1) All three reviewers agreed that not enough detail of the methods was provided. Please include this in the methods section, as well as an in-depth discussion and justification as to the choices made, including why LIME and CAM have been chosen as opposed to other methods, what are the inputs and outputs to the system, how the training and testing sets were chosen and how parameters for the two models were chosen.

2) The reviewers also agreed that the comparison of the performance of gLV to LSTM was not quite fair, due to many differences between the two implementations. To more fairly compare gLV to the LSTM, please augment the PyTorch code to have LSTM+FF and gLV+FF.

3) Carefully consider the language used throughout the paper, for example in the use of the term "deep learning".

Reviewer #1 (Recommendations for the authors):

Results, first paragraph. 'The experimental102 data was split into non-overlapping training and hold-out test sets, and an appropriate LSTM network was trained to predict species abundances at various time points given the information of initial species abundance. The details on the train/test split and the number of model hyperparameters are provided in Table S2'.

It is worth mentioning these details in the main text, at least regarding data division and architecture of the LSTM based model.

What was the motivation behind the particular division of the dataset: 624 training communities and 3299 testing communities? Why 6 or fewer species were used in the training and > = 10 were used in the testing dataset?

The authors state that since the glv model was used to generate the simulated data, they were not surprised to see that it predicts very well on this dataset. Would it be possible to use another benchmark to compare the two models?

Regarding the computational time of the LSTM model, I imagine it takes a lot of time to tune the hyper-parameters of the LSTM model. I think it should be mentioned that the 2 minutes required to train the LSTM model does not include this computational overhead. It is not clear how hyper-parameter tuning (number of layers, nodes, learning rate, activation function, etc.) was performed. Could you please add further details about this in the method section?

It is not clear why a FFN was used at the output of the LSTM? I could not find any information in the methods section about this. Can you explain why this particular setting was chosen?

The following line is not clear: 'all the network weights are trained simultaneously'.

What was the motivation behind choosing CAM and LIME as the explanation method among all the existing explanation methods?

LIME is a heuristic based method. How did you verify the robustness of the explanations? Could you please comment on if you changed some parameters (such as number of variables) and how it affected the explanations? How can we be sure that the explanation provided by the linear LIME models is actually what the non-linear LSTM model is doing? Are we agreeing with the explanations just because it is consistent with the current knowledge?

Reviewer #2 (Recommendations for the authors):

My general impression is the work is very dense and key insights are somewhat masked due to the lack of clarity in places. At the highest level, I think I understood the major points the authors tried to deliver (see above). However, I feel the authors can improve the presentation of the basic methodology and the intuition on a few points, which I find myself just having to trust what the authors have to say. Providing such intuition could minimize the impression of some of the performance comparisons being ad hoc.

1. It is actually not entirely clear what exactly feeds into the LSTM (or the gLV) and what comes out during the training process. Figure 1 is supposed to explain that but I could only gather that they're predicting time series. More specifically, what are the input variables? How many time series are being predicted? The authors might want to revise/expand the schematic to better explain the basic methodology.

Also, an overlay of a few representative true time courses vs. predicted time courses (e.g. good predictions vs. failed predictions) would be illuminating.

2. Figure 2 shows the prediction of gLV-model-simulated data using the LSTM or the gLV model. Only a brief description was provided, from which, I had a hard time understanding how the data were generated or predicted. For instance, the authors stated that a 25-member gLV model was used to make predictions. Those with < = 6 species were used to train the LSTM or the gLV model. Does it mean that, in these communities, only < = 6 species persisted after some time? And, the trained LSTM or gLV models only predict < = 6 species? If so, the following sentence becomes confusing. This confusion is related to the confusion above.

Also, how are higher-order interactions modeled in the gLV simulator?

3. It feels like the results in Figure 2 should go first (before Figure 1). The rationale is simple – for the simulated data, you have complete confidence in the ground truth, which can allow you to test the performance of the method in the idealized scenario. Also, the analysis on this data can probably allow you to provide an intuition when and why LSTM outperforms the gLV model (especially when the data were generated by another gLV model).

4. Also in Figure 2, since the data generation or training were not super time consuming, I wonder if the LSTM or gLV predictions could be drastically improved if larger training data sets are used. I'm actually quite surprised by the performance of LSTM on the simulated data, considering the somewhat limited size of the training set (624). A recent work had to use a much larger training set to achieve a high performance of LSTM (Wang et al., Nature Comm 2019) – this related work should be cited.

5. The overall performance of LSTM is impressive. It does fail occasionally even on simulated data (which is true for all ML models). To this end, the authors should discuss/show how to evaluate the confidence in individual predictions – checking each one (even if the ground truth is known) would defeat the purpose of using the neural network. This point is relevant when the trained neural network is being used to make predictions and probe biological insights.

6. I like how they use a side product of training procedure (backpropagation) to provide a quantitative metric of system sensitivity. I feel they could explain this point more.

7. For me, part of the reason the paper feels dense is that there are actually two stories (though closely related). One is the prediction of community dynamics. The other is the prediction of metabolite profiles. They represent somewhat different challenges in terms of evaluating LSTM performance.

The paper could have been split into two. Combining the two might have caused some of the confusion raised above.

Reviewer #3 (Recommendations for the authors):

Major comment: To make the comparison fair I would suggest using your torch code base to also obtain point estimates for the gLV parameters as well. There is no reason to use a sampling-based result from the prior study which uses a different model for metabolite prediction as well. Furthermore, the composite code is not accessible without a Matlab license (and thus not really reproducible). It is also trained in a two-step fashion (if this is not the case then there was not enough detail in the text to understand how the composite model was trained), and thus will be less accurate because of this. It should not take 5 hours for the model to learn the gLV parameters (your own reference [56] states that you have this down to minutes). Your torch code can be easily modified for the dynamics \log x_{t+1} = \log (x_t) + (diagonal(r) + A x_t) \δ_t and then this also means that you could have the same FF architecture for metabolites. This would allow for a much fairer comparison, both models could be trained in an identical fashion with the same metabolite prediction architecture. Although regularization and hyperparameter tuning would be slightly different of course.

Title and throughout: A single LSTM unit with a single hidden state (which can be of large dimension) is not really a deep network. With this interpretation a linear dynamical systems model with a single hidden state is a deep model as well.

21: In the abstract it is stated that current ODE models fail to capture complex behavior. This is not true. You fit an Ordinary difference equation to your data when solving for GLV, RNN, or LSTM (these are all ordinary because time is the only differential or difference variable). I think you meant to characterize the difference between parametric models like linear or gLV, etc, that are more "rigid" vs dynamical systems models that convolve functions up to arbitrary complexity (RNN, etc). Both are certainly classes of "Ordinary" difference or differential equations.

147: A gLV model is linear in the parameters so it is trivial to solve for the coefficients of the model in seconds. The text should be changed here. Also why does it take 5 hours to learn the gLV model when you state in [56] that you have this down to minutes?

Figure 1 (a) is this the model or the teacher forcing training procedure depicted. Also there is a subscript i, suggesting that there is no microbe-microbe interaction information captured by the model. Is the data in figure 1 all forecasted from t=0 or is this 1 sample look ahead prediction performance, it was not clear from the text?

Figure 2 (a) also has i subscript, was this intentional.

Methods:

– The LSTM model is not described in enough detail (what are the dimensions of the objects, how many hidden states [and why], how did you tune the hyperparameters of the model and the gradient solver [what gradient method did you use and why]).

– How was the composite model trained, was it indeed in a two-step fashion whereby the gLV model was trained and then a separate inference procedure was run for the metabolite model. More detail is needed here to understand what exactly was done and how the models were trained.

Plots in general:

– Plots only really show how the most abundant taxa are performing, would be nice to see the performance on the lower abundance taxa (maybe take a log of the OD?).

References:

You seem to be missing some important references from the Gerber Lab, "MDSINE Microbial Dynamical Systems INference Engine for microbiome time-series analyses" https://doi.org/10.1186/s13059-016-0980-6 is one such example but there are likely more given that lab primarily focuses on microbiome dynamics.

[Editors’ note: further revisions were suggested prior to acceptance, as described below.]

Thank you for resubmitting your work entitled "Deep Learning Enables Design of Multifunctional Synthetic Human Gut Microbiome Dynamics" for further consideration by eLife. Your revised article has been evaluated by Aleksandra Walczak (Senior Editor) and a Reviewing Editor.

The manuscript has been improved but there are some remaining issues that need to be addressed, as outlined below:

Reviewer #3 still finds that there is a need for a direct "apples-to-apples" comparison of gLV and LSTM. In your response to the reviewer's initial request, you are arguing that such a direct comparison would result in a very low performance of gLV. If this is the case, it would simply strengthen your argument and support the use of LSTM even more. It also did not seem like much additional work to carry out this analysis. If this is done and added as a supplement with some explanation of why it's not in the main text, I believe this would be satisfactory to all reviewers. Other than that, the reviewers were very appreciative of the modifications and the current quality of the paper.Reviewer #1 (Recommendations for the authors):

Thank you. This is exciting work, and the authors have addressed all of my concerns thoroughly.

Reviewer #2 (Recommendations for the authors):

The authors have fully addressed my comments on the original submission. I support its publication in the journal.

Reviewer #3 (Recommendations for the authors):

One outstanding issue with the paper remains. There is no apples-to-apples comparison of the LSTM+NN to a gLV+NN. I now comment on the authors' responses to an original comment.

[Authors] "We did not estimate new gLV parameters for this work, but rather used the gLV parameters that were already determined in [6]."

Having a composite model trained on different data and comparing it to a jointly trained model on entirely different data is not a sufficient comparison.

[Authors] "While we agree that the reviewer's proposed method for determining the gLV parameters would be a more direct comparison, there are several reasons why we chose to use the gLV parameters estimated in our previous study: (1) Higher resolution time-series measurements are needed to determine the exponential growth rate in the gLV model. Thus, estimating gLV parameters without the time-series measurements of monocultures would result in a poorly predictive model due to high uncertainty in the exponential growth rate parameters of the gLV model. Therefore, the proposed method of inferring parameters of the gLV model would yield a poorly predictive model."

If this is the case then the authors should show that the gLV model is not predictive with this sparse data. This begs the question though. Why did one need the LSTM model if the data doesn't really capture complex dynamics? The abstract stated that the LSTM was necessary for this very reason [Abstract]"Current models based on ecological theory fail to capture complex community behaviors due to higher-order interactions, do not scale well with increasing complexity and in considering multiple functions. We develop and apply a long short-term memory (LSTM) framework to advance our understanding of community assembly and health-relevant metabolite production using a synthetic human gut community." The authors are simultaneously arguing that the dynamics are too complex for simple gLV dynamics while also stating that the new time series are too simple to reliably train a gLV model. Is this a contradiction?

"(2) The methods used to estimate the gLV model parameters in our previous work were informed by higher resolution time-series measurements of monocultures (individual species). Thus, the gLV model had additional information beyond that of the LSTM model, which was only informed by the initial point and endpoint (with the exception of Figure 5)."

This statement is confusing. The LSTM model was trained using teacher forcing with intermediate time points per the methods section?

"(3) The optimization algorithm MATLAB FMINCON is a widely used approach for estimating parameters of the gLV model (see [6], [8] and [10], for example) (4) The proposed method of log transforming the gLV model only works with higher resolution time-series data. We have sparse time sampling in our datasets."

If indeed the data is so sparse that a traditional gLV model would not learn then it would be easy for the authors to show this. In order to be published Figure 4 should compare gLV+NN and LSTM+NN trained on the same data in the same way and compare the model's performance directly.

https://doi.org/10.7554/eLife.73870.sa1

Author response

Essential revisions:

1) All three reviewers agreed that not enough detail of the methods was provided. Please include this in the methods section, as well as an in-depth discussion and justification as to the choices made, including why LIME and CAM have been chosen as opposed to other methods, what are the inputs and outputs to the system, how the training and testing sets were chosen and how parameters for the two models were chosen.

We have addressed this comment throughout our response. here is a summary of our edits to the manuscript to address this issue: As we have commented elsewhere in this review, we have done the following to clarify the details of our modeling:

A. We have reorganized our methods section to make it easier to find relevant details. We have created three sections: “Experimental Methods”, “Computational Methods”, and “Specific Applications of Computational Methods”. This final section has new subsections describing all analyses presented in the paper with references to the specific Figure(s) that use the described methods.

B. We have added details about the ground truth models and train/test split used for ourin silico comparison of the gLV and LSTM in predicting species abundance in the section labeled “Comparison of gLV and LSTM in silico (Figure 1)”

C. We have clarified the Methods section describing the composite model used for comparison with the LSTM for predicting species abundance and metabolite production. Methods Section “Composite Model: Regression Models for Predicting Metabolite Concentrations (Figure 3)”.

D. We have reordered Figures 1 and 2 for a more logical flow and have edited these sections to make it easier for the reader to understand relevant details.

E. We have added additional justification for the use of LIME and CAM in the relevant places in the manuscript. This includes a new sensitivity analysis of the LIME results to different test/train splits of our data (Figure 3—figure supplement 5, Figure 3—figure supplement 6).

2) The reviewers also agreed that the comparison of the performance of gLV to LSTM was not quite fair, due to many differences between the two implementations. To more fairly compare gLV to the LSTM, please augment the PyTorch code to have LSTM+FF and gLV+FF.

We have updated the comparisons in Figure 3—figure supplement 3a to include the prediction accuracy for gLV+FF and gLV+Random Forest Regressor. While some improvements in the prediction of Succinate, Lactate, and Acetate were observed relative to the original composite model, the LSTM outperformed all of these models for all four metabolites. We have added text to describe this result in the main text:

“Additionally, replacing the regression portion of the composite model with either a Random Forest Regressor or a Feed Forward Network did not improve the metabolite prediction accuracy beyond that of the LSTM (Figure 3—figure supplement 3a).”

We have also added descriptions of these new methods to the manuscript in the section

“Composite Model: Regression Models for Predicting Metabolite Concentration (Figure 3)”

3) Carefully consider the language used throughout the paper, for example in the use of the term "deep learning".

The use of the terminology “deep learning” in the title and throughout the paper is consistent with accepted terminology used by the machine learning community for LSTM’s. Note that LSTMs are featured prominently as a mainstay category of deep learning networks in the definitive book on deep learning by Goodfellow, et al. [1, Ch. 10]. Please refer to our response 8 to Reviewer#3 for more details.

We have suitably revised the manuscript to improve readability. In particular, text corresponding to Figures 1 and 2 in the original manuscript have been reorganized for better readability. We have also added the following paragraph in the Discussion section to appropriately emphasize the scope of our work and some of its limitations.

“The current implementation of the LSTM model lacks uncertainty quantification for individual predictions, which could be used to guide experimental design [38]. Recent progress in using Bayesian recurrent neural networks has led to emergence of Bayesian LSTMs [39, 40], which provides uncertainty quantification for each prediction in the form of posterior variance or posterior confidence interval. However, currently, the implementation and training of such Bayesian neural networks can be significantly more difficult than training the LSTM model developed here. In addition, we benchmarked the performance of the LSTM against the standard gLV model which has been demonstrated to provide accurate predictions of community assembly in communities up to 25 species [6, 8]. The gLV model has been modified mathematically to capture more complex system behaviors [41]. However, while comparing such modified gLV models to the LSTM could further elucidate their differences, implementation of these gLV models to represent the behaviors of microbiomes with a large number of interacting species poses major computational challenges.”

Reviewer #1 (Recommendations for the authors):

Results, first paragraph. 'The experimental102 data was split into non-overlapping training and hold-out test sets, and an appropriate LSTM network was trained to predict species abundances at various time points given the information of initial species abundance. The details on the train/test split and the number of model hyperparameters are provided in Table S2'.

It is worth mentioning these details in the main text, at least regarding data division and architecture of the LSTM based model.

Thank you for the suggestion. We have included the following details in the main text in the revised version in Section titled “LSTM accurately predicts experimental microbial community assembly”.

“Of the 175 microbial communities, 102 microbial communities were selected randomly to constitute the training set, while the remaining 73 microbial communities constituted the hold-out test set.”

“Each LSTM unit consists of a single hidden layer comprising of 2048 hidden units with ReLU activation. The details on hyperparameter tuning, learning rates, and choice of optimizer are provided in the Methods section.”

What was the motivation behind the particular division of the dataset: 624 training communities and 3299 testing communities? Why 6 or fewer species were used in the training and > = 10 were used in the testing dataset?

This is an excellent question. This was done to demonstrate precisely the fact that our model generalizes to higher-order interactions. We train the model on simpler communities (≤ 6 species), and yet show that the model is able to capture microbial community dynamics in high richness communities (≥ 10 species). In natural communities, the number of species ranges between tens to thousands. In addition, the availability of sufficiently time-resolved measurements of community dynamics can be potentially very sparse.

The breakup of 624 communities is as follows: all 25 one-member communities, all 300 two-member communities, 100 (out of 2300 unique) three-member communities, 100 (out of 53130 unique) five-member communities and 99 (out of 177100) six-member communities. As evident, the total number of possible communities, even when one restricts to analyzing only simpler communities, is large since the total number grows exponentially with the number of species in the community. A similar training/test partitioning was used in our previous papers [5, 6, 7]. It is very common in other fields of science to ask the question of whether the behavior of lower-order systems (i.e. parts) can predict higher-order systems (whole system). Therefore, we apply a similar approach to microbial communities (e.g. similar questions have been asked in chemistry and physics).

We have added a detailed description on the rationale behind the choice of training/test sets in the section titled “LSTM outperforms the generalized Lotka Volterra ecological model” in the revised manuscript. We do not quote the edits here for the sake of brevity.

The authors state that since the glv model was used to generate the simulated data, they were not surprised to see that it predicts very well on this dataset. Would it be possible to use another benchmark to compare the two models?

We thank the reviewer for this question. If there was a widely accepted mechanistic model (beyond gLV) that described microbial community dynamics, we would have considered using this benchmark model instead of gLV for the comparison of the gLV and LSTM model in Figure 1. For example, dynamic flux balance analysis of genome-scale metabolic models can be used to generate time-resolved changes in species abundance. However, there are notable limitations of these models: (1) it can be challenging to use these models to study high species richness communities due to computational limitations, (2) these models have strict assumptions (e.g. metabolic fluxes are solved to optimize the growth rate of individual species), (3) genome sequences have missing or incorrect functional annotations especially for non-model bacterial species such as human gut isolates, (4) there is large uncertainty about the transport reactions of metabolites mediating interspecies interactions and (5) these models only capture inter-species interactions that arise from production/degradation of metabolites in metabolism and thus neglect other types of microbe-microbe interactions (e.g. environmental pH mediated interactions). Alternatively, Consumer-Resource models capture species growth dynamics as a function of resources uptake/utilization. However, these models generally do not capture other mechanisms beyond resource competition (e.g. metabolites that are produced by constituent members of the community and used as substrates by other members or production of toxins that inhibit species growth). Mechanisms of microbial interactions beyond resource competition are key determinants of community dynamics. In addition, these models are very difficult to apply in practice due to the myriad of unknown mechanisms of interaction driving microbe-microbe interactions. Therefore, while the gLV model does not capture the metabolites mediating inter-species interactions, it is a simplified ecological model that represents both positive and negative interactions shaping community dynamics. Despite its limitations, the gLV model is widely used by the microbiome research community and can accurately predict community assembly for synthetic microbiomes containing up to 25 species and elucidate significant inter-species interactions that were independently validated experimentally [6, 8, 7]. Finally, models that have been developed to augment the gLV with additional terms can be very challenging to implement in practice with experimental data.

Regarding the computational time of the LSTM model, I imagine it takes a lot of time to tune the hyper-parameters of the LSTM model. I think it should be mentioned that the 2 minutes required to train the LSTM model does not include this computational overhead. It is not clear how hyper-parameter tuning (number of layers, nodes, learning rate, activation function, etc.) was performed. Could you please add further details about this in the method section?

Thank you for your comments. The hyperparameters were tuned using a simple grid search. The hyperparameter choices include: (a) learning rate and its decay scheme for the Adam optimizer, (b) number of hidden layers per LSTM unit, (c) Number of units per layer within an LSTM unit, (d) mini-batch size, and (e) input data normalization. For feature normalization, we used the standard mean-zero, unit-variance normalization. The choices of learning rates included 0.005, 0.001 and 0.0001, respectively, each with a decay of 0.25 after every 25 epochs. Choices for mini-batch size included 1, 10, 20 and 50, respectively. It was observed that sizes 10 and 20 resulted in improved training loss, and hence we used mini-batch sizes of 10 or 20 in all our evaluations. The number of hidden layers per LSTM cell was iterated from 1 to 2. A two-layered LSTM did not result in any noticeable improvement over a single-layered LSTM cell, and thus we restricted our focus to just a single-layered LSTM cell for the sake of simplicity and faster training/inference. Finally, we tried 512, 1024, 2048 and 4096 hidden units per LSTM cell, and depending upon the complexity of the problem, we decided to work with either 2048 hidden units (no prediction of metabolites) or 4096 hidden units (simultaneous prediction of metabolites).

It must also be emphasized that the model hyperparameters were optimized for performance on a held-out cross-validation set (formed using randomly chosen 10% of the samples from the training data). Since the time taken for each run of the training process is only a couple of minutes, the entire process of tuning hyperparamters does not result in any significant computational efforts.

We have added the relevant details in the revised manuscript in a new subsection titled “Hyperparameter tuning for LSTM Networks”.

We would also like to emphasize that training gLV models, too, requires hyperparameter tuning. Fitting an ODE-based parametric model entails solving nonlinear optimization problems, the solution of which is highly dependent on the choice of initial guess parameters, choice of optimization algorithms – (a) active-set methods, (b) interior-point methods, or (c) trust region-reflective methods; and a prior knowledge of suitable parameter range. Additionally, fitting of ODE-based models also frequently involves selection of regularization parameters to reduce overfitting of the model to the noise in the experimental data [8] [6].

It is not clear why a FFN was used at the output of the LSTM? I could not find any information in the methods section about this. Can you explain why this particular setting was chosen?

We thank the reviewer for this request for clarification. We did not use an FFN at the output of the LSTM for predicting predicting species abundances. However, we did use an FFN as the final layer of the LSTM to predict metabolite concentrations. In considering your comment, we noticed two inaccuracies in the schematic representation of our models. In Figure 3a, we neglected to include the FFN in the schematic, whereas an FFN was incorrectly included in Figure 5h in our original submission. We have now corrected these figures in the revised paper.

Recall that species abundance is Figure 3 is a 25-dimensional vector comprising of absolute abundances of each of the species. On the other hand, metabolite profile is a four-dimensional vector corresponding to metabolite concentration of each of the four metabolites. The FFN layer converts the 25-dimensional vector at the output of the last LSTM unit to a four dimensional vector, and is trained to predict the metabolite profile. Thus, only when the species absolute abundance measurements are available at sampled time-points, an FFN is used to map the species abundance output to corresponding metabolite profile. We have clarified this statement in the revised version in the subsection entitled “LSTM enables end-point design of multifunctional synthetic human gut microbiomes” “This model uses a feed-forward network (FFN) at the output of the final LSTM unit that maps the end-point species abundance (a 25-dimensional vector) to the concentrations of the four metabolites (Figure 3a).”

The following line is not clear: 'all the network weights are trained simultaneously'.

We apologize for any confusion. The statement is added to emphasize that the model was trained in an end-to-end manner. We have reworded the phrase in the revised version in Section titled “LSTM enables end-point design of multifunctional synthetic human gut microbiomes”.

“The entire neural network model comprising LSTM units and a feed-forward network is learned in an end-to-end manner during the training process, (i.e., all the network weights are trained simultaneously).”

What was the motivation behind choosing CAM and LIME as the explanation method among all the existing explanation methods?

Please refer to our detailed response to your public review summary. Additional details on the choice of explain ability methods can be found in the Section titled “Understanding Relationships Between Variables Using LIME” in the revised manuscript.

LIME is a heuristic based method. How did you verify the robustness of the explanations? Could you please comment on if you changed some parameters (such as number of variables) and how it affected the explanations? How can we be sure that the explanation provided by the linear LIME models is actually what the non-linear LSTM model is doing? Are we agreeing with the explanations just because it is consistent with the current knowledge?

We thank the reviewer for the excellent question. Our work encompasses three-fold verification of LIME explanations – (a) As suggested by the reviewer, the explanations are consistent with the current knowledge based on our previous study [6], (b) the explanations are consistent with those of sensitivity-gradient method (Figure 5), (c) To demonstrate that LIME explanations of the full community (25-member) were robust, we performed new model analyses by partitioning the training data by 20-fold and training an LSTM model on each partition. In almost every case, the inter-quartile range of the strongest LIME explanations remained either strictly positive or negative, indicating that while the magnitude of LIME explanations varied, the sign (representing the type of interaction) was consistent. This implies that the LIME explanations are robust to variations in the amount of training data (see new Figure 3—figure supplement 5 and Figure 3—figure supplement 6). We have added the following text:

“We explored the consistency of LIME explanations for the full 25-member community in response to random partitions of the training data to provide insights into the sensitivity of the LIME explanations given the training data (Figure 3—figure supplement 5, Figure 3figure supplement 6). These results demonstrated that the direction of the strongest LIME explanations of the full community were consistent in sign despite variations in magnitude.”

Reviewer #2 (Recommendations for the authors):

My general impression is the work is very dense and key insights are somewhat masked due to the lack of clarity in places. At the highest level, I think I understood the major points the authors tried to deliver (see above). However, I feel the authors can improve the presentation of the basic methodology and the intuition on a few points, which I find myself just having to trust what the authors have to say. Providing such intuition could minimize the impression of some of the performance comparisons being ad hoc.

We apologize for any confusion. We have made the following modifications to the text to clarify the presentation. (1) We have inverted the order of the first two sections as suggested in your comment below. (2) We have made substantial edits to the first two sections (highlighted in the revised manuscript) to streamline the information presented and clarify details of our methods. (3) We have added two new section headings to better guide the reader to the key takeaways: for the LIME Analysis: “Using local interpretable model-agnostic explanations to decipher interactions” and for the LSTM trained on higher time-resolution data: “Using LSTM with higher time-resolution to interpret contextual interactions”

1. It is actually not entirely clear what exactly feeds into the LSTM (or the gLV) and what comes out during the training process. Figure 1 is supposed to explain that but I could only gather that they're predicting time series. More specifically, what are the input variables? How many time series are being predicted? The authors might want to revise/expand the schematic to better explain the basic methodology.

We apologize for any confusion. Figure 2 (Figure 1 in the original submission) represents a scatter plot of all test communities (consisting of a 12-member synthetic human gut community and subsets of this community) across all species at different time instants. The input to our LSTM network is the initial species absolute abundance for each community. The LSTM predicts the absolute abundances of different species in the community at various time points. For the ease of illustration and brevity, the predicted abundances of species across 73 test communities (not included in the training data) are plotted against their true abundances.

We agree with the reviewer that it is illuminating to show representative communities that varied in their prediction accuracy by the LSTM network using the time-series data shown in Figure 2. As such, we have added a new Supplementary Figure (Figure 2—figure supplement 1) shows the predictions of the LSTM network for three (out of 73) representative communities: (a) well-predicted, (b) prediction score in the median range, (c) poorly-predicted. The prediction R2-score for each community is the coefficient of determination between measured (true) and predicted abundances of all species in that community at all time instants. A histogram of R2-scores across 73 test communities is shown in Figure 2—figure supplement 1a. These data show that most communities are predicted with significantly higher R2-scores than the median score being ≈ 0.76.

Additionally, we have also included plots showing the temporal evolution of each species in the three representative communities – (i) well predicted (R2 = 0.91, Figure 2—figure supplement 1b), (ii) median score (R2 = 0.76, Figure 2—figure supplement 1c), (iii) poorly-predicted (R2 = 0.14, Figure 2—figure supplement 1d). The test dataset, comprising of 73 communities, had only 15% communities with ≥ 5 species (majority were 2-4 member communities). We found that higher richness communities were accurately predicted by our LSTM network.

Also, an overlay of a few representative true time courses vs. predicted time courses (e.g. good predictions vs. failed predictions) would be illuminating.

2. Figure 2 shows the prediction of gLV-model-simulated data using the LSTM or the gLV model. Only a brief description was provided, from which, I had a hard time understanding how the data were generated or predicted. For instance, the authors stated that a 25-member gLV model was used to make predictions. Those with < = 6 species were used to train the LSTM or the gLV model. Does it mean that, in these communities, only < = 6 species persisted after some time? And, the trained LSTM or gLV models only predict < = 6 species? If so, the following sentence becomes confusing. This confusion is related to the confusion above.

Also, how are higher-order interactions modeled in the gLV simulator?

We again apologize for the confusion due to our lack of detailed descriptions of the in silico methods. We have substantially modified the section “LSTM outperforms the generalized Lotka Volterra ecological model” and added additional information to the Methods section describing these in silico experiments “Comparison of gLV and LSTM in silico”. To clarify the specified point referring to “communities with ≤ 6 species”: This means only 6 species are present with non-zero abundance at the initial time point (time 0). Thus, all other species should have zero abundance at later time points (i.e. species whose abundance at t=0 is zero do not spontaneously appear at later time points).

3. It feels like the results in Figure 2 should go first (before Figure 1). The rationale is simple – for the simulated data, you have complete confidence in the ground truth, which can allow you to test the performance of the method in the idealized scenario. Also, the analysis on this data can probably allow you to provide an intuition when and why LSTM outperforms the gLV model (especially when the data were generated by another gLV model).

Thank you very much for your suggestion. We, too, had initially considered adding Figure 2 (and accompanying text) before Figure 1. Upon further consideration, we agree that this is the more logical order and have rearranged these two sections in the text.

4. Also in Figure 2, since the data generation or training were not super time consuming, I wonder if the LSTM or gLV predictions could be drastically improved if larger training data sets are used. I'm actually quite surprised by the performance of LSTM on the simulated data, considering the somewhat limited size of the training set (624). A recent work had to use a much larger training set to achieve a high performance of LSTM (Wang et al., Nature Comm 2019) – this related work should be cited.

We agree with the reviewer that the performance of any data-driven algorithm improves with both the quantity and quality of available data. The reason we purposely worked with limited training data was primarily due to the fact that in a practical (experimental) setting, data collection is expensive and thus, any model, capable of working with limited data will prove to be most useful. Another compelling reason for working with limited data is to elucidate that LSTMs generalize better, i.e., LSTMs can translate learning on lower richness communities to predict the behavior of higher richness communities. Finally, the quality of training data also plays a significant role in the learned predictive behavior of the model. Suppose, our experimental budget allows us to perform a fixed number of limited experiments. Certainly, if each of these experiments consist of communities that do not involve a particular species ‘X’, and the model is then asked to predict the behavior of communities involving ‘X’, the model is bound to fail. Therefore, training data that includes all species and displays a wide range in the number of species in each community (wide range of species richness) should constitute the ideal learning scenario. Further, the training data should display sufficient variation in the abundance of each species or concentration of each metabolite to be informative for the model (see Figure 4—figure supplement 1). We have cited the Want et al., Nat Commun 2019 paper in our Discussion section:

“Achieving a highly predictive LSTM model required substantially less training data than a previous study that approximated the behavior of mechanistic biological systems models with RNNs (Figure 2) [9]. While the performance of any data-driven algorithm improves with the quantity and quality of available data, we demonstrate that the LSTM can translate learning on lower-order communities to accurately predict the behavior of higher-order communities given a limited and informative training set that is experimentally feasible. For synthetic microbial communities, the quality of the training set depends on the frequency of timeseries measurements within periods in which the system displays rich dynamic behaviors (i.e. excitation of the dynamic modes of the system), the range of initial species richness, representation of each community member in the training data and sufficient variation in species abundance or metabolite concentration (Figure 4—figure supplement 1).”

5. The overall performance of LSTM is impressive. It does fail occasionally even on simulated data (which is true for all ML models). To this end, the authors should discuss/show how to evaluate the confidence in individual predictions – checking each one (even if the ground truth is known) would defeat the purpose of using the neural network. This point is relevant when the trained neural network is being used to make predictions and probe biological insights.

Quantification of uncertainty in predictions is an excellent suggestion. However, like most other discriminative machine learning models, the LSTM produces predictions but is not equipped to quantify the uncertainty associated with a given prediction. We have endeavored in this paper to quantify the overall uncertainty over all training data using cross-validation analysis for assessing sensitivity of the LSTM predictions (Figure 4a) and of the LIME explanations for a representative community (25-member community, see new Figure 3—figure supplement 5 and Figure 3—figure supplement 6). However, to obtain a sample-by-sample prediction of uncertainty of the prediction requires additional modeling of uncertainty, e.g., provided by a Bayesian formulation of the LSTM, would be required, which is beyond the scope of this paper and an ongoing project in our lab. However, since we agree with the reviewer on the importance of quantification confidence, in the revision we have included the following comment near the end of the Discussion section:

“The current implementation of the LSTM model lacks uncertainty quantification for individual predictions, which could be used to guide experimental design. Recent progress in using Bayesian recurrent neural networks has led to emergence of Bayesian LSTMs, which provides uncertainty quantification for each prediction in the form of posterior variance or posterior confidence interval. However, currently, the implementation and training of such Bayesian neural networks can be significantly more difficult than training the LSTM model developed here. In addition, we benchmarked the performance of the LSTM against the standard gLV model which has been demonstrated to provide accurate predictions of community assembly in communities up to 25 species. The gLV model has been modified mathematically to capture more complex system behaviors. However, while comparing such modified gLV models to the LSTM could further elucidate their differences, implementation of these gLV models to represent the behaviors of microbiomes with a large number of interacting species poses major computational challenges.”

6. I like how they use a side product of training procedure (backpropagation) to provide a quantitative metric of system sensitivity. I feel they could explain this point more.

Thank you for your observation. Unlike LIME-based interpretability methods that require training an additional model on top of already trained deep-learning models, the proposed sensitivity analysis requires a single backpropagation pass to capture input-output behavior. The proposed method is thus computationally inexpensive, and produces similar explanations as LIME’s. We have emphasized this further in our revised version in the subsection entitled “Understanding Relationships Between Variables Using Sensitivity Gradients.”

7. For me, part of the reason the paper feels dense is that there are actually two stories (though closely related). One is the prediction of community dynamics. The other is the prediction of metabolite profiles. They represent somewhat different challenges in terms of evaluating LSTM performance.

The paper could have been split into two. Combining the two might have caused some of the confusion raised above.

While we agree that combining the prediction of species abundance with metabolite concentrations makes the story more dense, we also would like to emphasize that these two problems are intimately connected. From a biological perspective, microbes continuously impact metabolites by reacting to these metabolites and releasing or consuming them. For example, metabolites impact microbial growth by being energy sources/building blocks, causing toxicity, etc. By modeling both microbial growth and the metabolites they produce/consume together, we can capture these key interconnections. This is particularly evident in the examples shown in Figure 5 where we see non-monotonic metabolite trajectories. These trends in metabolite concentrations are likely due to different phases of production/consumption of metabolites which could be captured in our model as feedback between the microbes and metabolite variables. We have revised our Discussion to mention this key aspect of our study.

“Microbial communities continuously impact metabolites by releasing or consuming them. Therefore, by modeling both microbial growth and the metabolites they produce/consume together, we were able to capture the interconnections between these variables.”

Reviewer #3 (Recommendations for the authors):

Major comment: To make the comparison fair I would suggest using your torch code base to also obtain point estimates for the gLV parameters as well. There is no reason to use a sampling-based result from the prior study which uses a different model for metabolite prediction as well. Furthermore, the composite code is not accessible without a Matlab license (and thus not really reproducible). It is also trained in a two-step fashion (if this is not the case then there was not enough detail in the text to understand how the composite model was trained), and thus will be less accurate because of this. It should not take 5 hours for the model to learn the gLV parameters (your own reference [56] states that you have this down to minutes). Your torch code can be easily modified for the dynamics \log x_{t+1} = \log (x_t) + (diagonal(r) + A x_t) \δ_t and then this also means that you could have the same FF architecture for metabolites. This would allow for a much fairer comparison, both models could be trained in an identical fashion with the same metabolite prediction architecture. Although regularization and hyperparameter tuning would be slightly different of course.

We thank the reviewer for this comment. While we appreciate the reviewer’s desire for a more direct comparison, we chose to compare our LSTM framework to the current best implementation of the gLV modeling framework in our previous study [6]. Part of the confusion here may have been due to lack of clarity in our Methods section. We did not estimate new gLV parameters for this work, but rather used the gLV parameters that were already determined in [6].

While we agree that the reviewer’s proposed method for determining the gLV parameters would be a more direct comparison, there are several reasons why we chose to use the gLV parameters estimated in our previous study: (1) Higher resolution time series measurements are needed to determine the exponential growth rate in the gLV model. Thus, estimating gLV parameters without the time-series measurements of monocultures would result in a poorly predictive model due to high uncertainty in the exponential growth rate parameters of the gLV model. Therefore, the proposed method of inferring parameters of the gLV model would yield a poorly predictive model. (2) The methods used to estimate the gLV model parameters in our previous work were informed by higher resolution time-series measurements of monocultures (individual species). Thus, the gLV model had additional information beyond that of the LSTM model, which was only informed by the initial point and end point (with the exception of Figure 5). (3) The optimization algorithm MATLAB FMINCON is a widely used approach for estimating parameters of the gLV model (see [6], [8] and [10], for example) (4) The proposed method of log transforming the gLV model only works with higher resolution time-series data. We have sparse time sampling in our datasets.

We have modified the methods section referring the Composite Model to clarify this point:

“Our composite model implementation is similar to the model described in [8] for predicting metabolite concentration from community composition at a particular time. We used the exact gLV model parameter distributions from [8] which were determined using an approach based on [62].”

On the topic of the MATLAB license and reproducible results, we found in our previous work that we needed the tools provided in MATLAB for the task at hand. If an interested party without access to a MATLAB license wanted to reproduce these results, they could do so using GNU Octave (https://www.gnu.org/software/octave/index), a free software compatible with most MATLAB functions.

While [11] describes a method for estimating gLV parameters that takes only minutes, this method used only individual species and pairwise community growth data. In our work described in [6], we found that including communities with a higher number of species reduced the sparsity of the matrix computations and thus complicated the use of this method, so we adapted to use the MATLAB FMINCON function. This approach is fully described in [6] and does require hours to for parameter estimation.

Title and throughout: A single LSTM unit with a single hidden state (which can be of large dimension) is not really a deep network. With this interpretation a linear dynamical systems model with a single hidden state is a deep model as well.

While it is true that a single LSTM unit has a single hidden layer, it is common to implement a sequence of LSTM’s arranged in multiple layers for modeling time series data, a process known as unrolling, which produces a deep network. As explained in the text, and exhibited in Figures 1a and 2a, we are using such a multilayer LSTM, where each layer corresponds to a different time point of our data. Thus our use of the term “deep network” for our LSTM model is consistent with accepted terminology used by the machine learning community for LSTM’s. Note that LSTMs are featured prominently as a mainstay category of deep learning networks in the definitive book on deep learning by Goodfellow, et al. [1, Ch. 10].

21: In the abstract it is stated that current ODE models fail to capture complex behavior. This is not true. You fit an Ordinary difference equation to your data when solving for GLV, RNN, or LSTM (these are all ordinary because time is the only differential or difference variable). I think you meant to characterize the difference between parametric models like linear or gLV, etc, that are more "rigid" vs dynamical systems models that convolve functions up to arbitrary complexity (RNN, etc). Both are certainly classes of "Ordinary" difference or differential equations.

We apologize for the confusion here. The reviewer is correct in emphasizing that the point we were actually trying to make was the difference between parametric models based on ecological theory (like gLV) and data driven models such as the RNN and LSTM. We have modified the abstract to better reflect this comparison using the following phrasing:

“Current models based on ecological theory fail to capture complex community behaviors due to higher-order interactions, do not scale well with increasing complexity and in considering multiple functions.”

and

“We show that the LSTM model can outperform the widely used generalized Lotka-Volterra model based on ecological theory.”

147: A gLV model is linear in the parameters so it is trivial to solve for the coefficients of the model in seconds. The text should be changed here. Also why does it take 5 hours to learn the gLV model when you state in [56] that you have this down to minutes?

We thank the reviewer for asking for additional clarification regarding this point. Reiterating our response above:

“While [11] describes a method for estimating gLV parameters that takes only minutes, this method used only individual species and pairwise community growth data. In our work described in [6], we found that including communities with higher numbers of species reduced the sparsity of the matrix computations and complicated the use of this method. Therefore, we adapted to use the MATLAB FMINCON function. This approach is fully described in [6] and does require hours for parameter estimation.”

Figure 1 (a) is this the model or the teacher forcing training procedure depicted. Also there is a subscript i, suggesting that there is no microbe-microbe interaction information captured by the model. Is the data in figure 1 all forecasted from t=0 or is this 1 sample look ahead prediction performance, it was not clear from the text?

We apologize for the confusion with the figure schematics. This is a depiction of the model. We have updated the instances of Xi in both Figure 1a and Figure 2a to be X with an underscore to clarify that the model input is a vector of species abundances rather than a single species abundance. Thus, this enables the model to capture inter-species microbial interactions. We have updated Figure 1 and Figure 2 captions to clarify this point. All model predictions in the paper are forecasted from time 0, we have added the following sentence to the figure captions for Figures 1 and 2 to clarify this point:

“All predictions are forecasted from the abundance at time 0.”

Figure 2 (a) also has i subscript, was this intentional.

We thank the reviewer for this comment. Reiterating our response from above:

“We apologize for the confusion with the figure schematics. We have updated the instances of Xi in both Figure 1a and Figure 2a to be X with an underscore to clarify that the model input is a vector of species abundances rather than a single species abundance, thus enabling the model to capture inter-species microbial interactions.”

Methods:

– The LSTM model is not described in enough detail (what are the dimensions of the objects, how many hidden states [and why], how did you tune the hyperparameters of the model and the gradient solver [what gradient method did you use and why]).

We apologize for any confusion. The model details for each experiment, including the details on the number of hidden states, learning rate for the optimizer, batch size and other hyperparameters were provided in Supplementary File 1. However, we agree that there was inadequate explanation provided and we have corrected this in the revision. We used Adam optimizer for all our experiments. The choice of Adam over simple gradient descent, rmsprop or any other gradient-based optimizers was purely motivated by its empirically fast convergence behavior and lower training and validation errors. For additional details on hyperparameter tuning, please refer to our response 5 to Reviewer #1. For increased clarity, we have added a new section titled “Hyperparameter tuning for LSTM Networks” in the revised paper.

“Similar to other learning algorithms, training an LSTM network entails choosing a set of hyperparameters for optimal performance. We used an exhaustive grid-search for hyperparameter optimization, while the choice of learning algorithm (optimizer) was restricted to Adam due to its superior empirical performance. For each experiment, nearly 10% of the training dataset was reserved as a cross-validation set, and the performances of the trained models were evaluated using cross-validation sets. This information was used to select the best hyperparameter settings. The choices for hyperparameters include: (a) learning rate, (b) number of hidden layers per LSTM unit, (c) number of units per layer within an LSTM unit, (d) mini-batch size, and (e) input data normalization. The input features are normalized to have zero mean and unit variance. The choices of learning rates include 0.005, 0.001 and 0.0001, respectively, each with a decay of 0.25 after every 25 epochs. The gradual decay in learning rates prevents potential overfitting to the data. Choices for mini-batch size included 1, 10, 20 and 50, respectively. It was observed that sizes 10 and 20 resulted in improved training loss, and hence we used mini-batch sizes of 10 or 20 in all evaluations. The number of hidden layers per LSTM cell was iterated from 1 to 2. A two-layered LSTM did not result in any noticeable improvement over a single-layered LSTM cell, and thus we restricted our focus to just a single-layered LSTM cell for the sake of simplicity and faster training/inference. Finally, we tried 512, 1024, 2048 and 4096 hidden units per LSTM cell, and depending upon the complexity of the problem, we used 2048 hidden units (predictions of species and no prediction of metabolites) or 4096 hidden units (simultaneous prediction of species and metabolites). The exact details on the number of training epochs, learning rates, decay rates for different experiments can be found in the Supplementary File 1.”

And

“Note that both the composite as well as the LSTM model require tuning of hyperparameters for optimal performance. The details on the computational implementation are provided in the Methods section.”

– How was the composite model trained, was it indeed in a two-step fashion whereby the gLV model was trained and then a separate inference procedure was run for the metabolite model. More detail is needed here to understand what exactly was done and how the models were trained.

The models are trained in a two-step fashion. The gLV-model being a parametric ODE-model, the model parameters are obtained through nonlinear optimization methods. First, we would like to emphasize again, as we have addressed elsewhere in the review, that we did not train a new gLV model for the composite model in this work. Rather we used the parameters determined in our previous publication [6]. We did train a new regression model for each of the 4 metabolites, because the previous work only trained a model to predict one metabolite (butyrate). We have updated the Methods sections associated with the composite model to clarify this point. Here is that methods text for your reference:

“Our composite model implementation is similar to the model described in [6] for predicting metabolite concentration from community composition at a particular time. We used the exact gLV model parameter distributions from [6] which were determined using an approach based on [11]. The regression model mapping endpoint species abundance to metabolite concentrations from [6] was focused specifically on the prediction of butyrate. Therefore, we adapted the approach to prediction of multiple metabolites. First, we modified the model form to include first order and interaction terms for all 25 species, rather than just the butyrate producers. Then, we separately trained 4 regression models, one for each metabolite (butyrate, lactate, acetate, succinate), using the measured species abundance and measured metabolite concentrations from the same dataset used to train the LSTM model. We trained these models as described previously [6] by using Python scikit-learn [12] and performed L1 regularization to minimize the number of nonzero parameters. Regularization coefficients were chosen by using 10-fold cross validation. We selected the regularization coefficient value with the lowest median mean-squared error across the training splits.”

Plots in general:

– Plots only really show how the most abundant taxa are performing, would be nice to see the performance on the lower abundance taxa (maybe take a log of the OD?).

While we appreciate the interest in low abundance taxa, we are not confident in the precision of the experimental measurements of low abundance species (i.e. (< 1%) of the total) due to limited sequencing depth. We typically sequence on the order of 10,000 molecules of PCR amplified DNA of the 16S rRNA gene per sample. Therefore, if a species is only represented by on the order of 10 of those molecules (0.1%), then a single sequencing read error would be a 10% error, whereas a species represented by > 100 reads (1%) would only have 1% error per read. We chose not to include plots with log scale as to not mislead the reader in regards to the precision of our measurements. We have modified the methods section text to acknowledge this limit on the precision of our measurements:

“We expect the precision of our measurements to drop rapidly for species representing < 1% of the community due to limited sequencing depth. We typically sequenced on the order of 10,000 molecules of PCR amplified DNA of the 16S rRNA gene per sample, so if a species is only represented by on the order of 10 of those molecules (0.1%), then a single sequencing read error would be a 10% error, whereas a species represented by (> 100) reads (1%) would only have 1% error per read.”

References:

You seem to be missing some important references from the Gerber Lab, "MDSINE Microbial Dynamical Systems INference Engine for microbiome time-series analyses" https://doi.org/10.1186/s13059-016-0980-6 is one such example but there are likely more given that lab primarily focuses on microbiome dynamics.

Thanks for the suggestion. We have now cited the Gerber Lab paper in our Introduction.

“The parameters of the gLV model can be efficiently inferred based on properly collected absolute abundance measurements and can provide insight into significant microbial interactions shaping community assembly.”

References

1. Goodfellow, I., Bengio, Y., Courville, A. and Bengio, Y. Deep learning, vol. 1 (MIT press Cambridge, 2016).

2. Lundberg, S. M. and Lee, S.-I. A unified approach to interpreting model predictions. In Proceedings of the 31st international conference on neural information processing systems, 4768–4777 (2017).

3. Jia, R. et al. Towards efficient data valuation based on the shapley value. In The 22nd International Conference on Artificial Intelligence and Statistics, 1167–1176 (PMLR, 2019).

4. Jia, R. et al. An empirical and comparative analysis of data valuation with scalable algorithms. arXiv preprint arXiv:1911.07128 (2019).

5. Venturelli, O. S. et al. Deciphering microbial interactions in synthetic human gut microbiome communities. Molecular systems biology 14, e8157 (2018).

6. Clark, R. L. et al. Design of synthetic human gut microbiome assembly and butyrate production. Nature communications 12 (2021).

7. Hromada, S. et al. Negative interactions determine clostridioides difficile growth in synthetic human gut communities. Molecular systems biology 17, e10355 (2021).

8. Venturelli, O. S. et al. Deciphering microbial interactions in synthetic human gut microbiome communities. Molecular Systems Biology 14 (2018). URL https://onlinelibrary.wiley.com/doi/10.15252/msb.20178157.

9. Wang, S. et al. Massive computational acceleration by using neural networks to emulate mechanism-based biological models. Nature communications 10, 1–9 (2019).

10. Marino, S., Baxter, N. T., Huffnagle, G. B., Petrosino, J. F. and Schloss, P. D. Mathematical modeling of primary succession of murine intestinal microbiota. Proceedings of the National Academy of Sciences 111, 439–444 (2014).

11. Shin, S., Venturelli, O. S. and Zavala, V. M. Scalable nonlinear programming framework for parameter estimation in dynamic biological system models. PLOS Computational Biology 15, e1006828 (2019). URL https://dx.plos.org/10.1371/journal.pcbi.1006828.

12. Pedregosa, F. et al. Scikit-learn: Machine learning in python. the Journal of machine Learning research 12, 2825–2830 (2011).

[Editors’ note: further revisions were suggested prior to acceptance, as described below.]

Reviewer #3 still finds that there is a need for a direct "apples-to-apples" comparison of gLV and LSTM. In your response to the reviewer's initial request, you are arguing that such a direct comparison would result in a very low performance of gLV. If this is the case, it would simply strengthen your argument and support the use of LSTM even more. It also did not seem like much additional work to carry out this analysis. If this is done and added as a supplement with some explanation of why it's not in the main text, I believe this would be satisfactory to all reviewers. Other than that, the reviewers were very appreciative of the modifications and the current quality of the paper.

Thank you very much for your efforts in soliciting reviews on the revised draft. We appreciate your acknowledgement of our efforts towards addressing the comments from the reviewers in the last submitted version. We have further revised our manuscript to address the remaining comment from Reviewer#3. Contrary to the opinion of Reviewer 3, an “apples-to-apples” comparison of the gLV and LSTM approaches is not straightforward to implement, and thus in our previous response we took the position that we had adequately demonstrated the benefits of LSTM in our simulations and experiments. However, we have carefully reconsidered our position in light of your comments, and those of Reviewer 3, and we have now included an apples-to-apples comparison between the LSTM and gLV models. While performing this comparison involved quite a bit of work, it demonstrates that the LSTM has better predictive performance for species abundance and metabolites, which indeed strengthens the paper. In the interest of transparency and reproducibility, we have uploaded the full python script of our implementation onto our project Gitlab page.

We believe that our accompanying responses, along with the additional set of experiments suitably addresses the comments from Reviewer#3.

Reviewer #1 (Recommendations for the authors):

Thank you. This is exciting work, and the authors have addressed all of my concerns thoroughly.

Thank you very much for your valuable inputs and encouraging remarks.

Reviewer #2 (Recommendations for the authors):

The authors have fully addressed my comments on the original submission. I support its publication in the journal.

Thank you very much for your valuable inputs and recommendation for publication.

Reviewer #3 (Recommendations for the authors):

One outstanding issue with the paper remains. There is no apples-to-apples comparison of the LSTM+NN to a gLV+NN. I now comment on the authors' responses to an original comment.

[Authors] "We did not estimate new gLV parameters for this work, but rather used the gLV parameters that were already determined in [6]."

Having a composite model trained on different data and comparing it to a jointly trained model on entirely different data is not a sufficient comparison.

We understand your concern. We have now trained an discrete gLV (approximate gLV) + FFN model for predicting temporal changes in species abundance and metabolite concentrations. In addition, we uploaded the script on the project’s Gitlab page for transparency and reproducibility.

[Authors] "While we agree that the reviewer's proposed method for determining the gLV parameters would be a more direct comparison, there are several reasons why we chose to use the gLV parameters estimated in our previous study: (1) Higher resolution time-series measurements are needed to determine the exponential growth rate in the gLV model. Thus, estimating gLV parameters without the time-series measurements of monocultures would result in a poorly predictive model due to high uncertainty in the exponential growth rate parameters of the gLV model. Therefore, the proposed method of inferring parameters of the gLV model would yield a poorly predictive model."

If this is the case then the authors should show that the gLV model is not predictive with this sparse data. This begs the question though. Why did one need the LSTM model if the data doesn't really capture complex dynamics? The abstract stated that the LSTM was necessary for this very reason [Abstract]"Current models based on ecological theory fail to capture complex community behaviors due to higher-order interactions, do not scale well with increasing complexity and in considering multiple functions. We develop and apply a long short-term memory (LSTM) framework to advance our understanding of community assembly and health-relevant metabolite production using a synthetic human gut community." The authors are simultaneously arguing that the dynamics are too complex for simple gLV dynamics while also stating that the new time series are too simple to reliably train a gLV model. Is this a contradiction?

The gLV model captures interactions up to second-order. On the other hand, LSTMs do not impose such restrictions on the type of interactions. Since nonlinear recurrent neural network models, such as LSTMs, have universal approximation guarantees, LSTM networks are capable of learning interactions beyond second-order. This is not a contradiction, but a restatement of the fact that such simple ecological models are only good approximations when the model is close to the underlying ground truth model. This is highlighted in our analysis on simulated data, where mild third-order perturbations significantly reduce the gLV model’s predictive capabilities (Figure 1c).

"(2) The methods used to estimate the gLV model parameters in our previous work were informed by higher resolution time-series measurements of monocultures (individual species). Thus, the gLV model had additional information beyond that of the LSTM model, which was only informed by the initial point and endpoint (with the exception of Figure 5)."

This statement is confusing. The LSTM model was trained using teacher forcing with intermediate time points per the methods section?

The gLV model comprises of two components – (a) unary interactions (i.e. exponential growth rate), (b) pairwise interactions. These are described by the following set of coupled-ODEs:

dxi(t)dt=(ri+j=1Nspeciesaij xj(t))xi\ (t),

where {ri} and {aij} represent individual species exponential growth rate and pairwise interaction coefficients, respectively. A better suited choice of training procedure for gLV comprises of learning the unary coefficients {ri} and self-interaction coefficients {aii} first, followed by learning the pairwise interaction coefficients {aij},i ̸ = j. In the absence of any other species, the effect of the term xj(t) · xi(t) is zero, and hence the monocultures (individual species) are informative of the unary coefficients {ri} and self-interaction coefficients {aii}. Using this model framework, a higher time-resolved data will provide a better estimate of the model parameters. We adopted the same procedure for learning {ri} and {aii}. On the other hand, the LSTMs are not built on any explicit first and second-order interactions, and thus, training the LSTM does not require higher time-resolved monospecies growth measurements.

[Re: teacher forcing]: We apologize for any confusion concerning our use of the word ”higher resolution,” which refers to the training phase of LSTM and not to its inference phase. When training an LSTM model, while a component of the loss function is used to force the predictions of the intermediate LSTM units to predict the correct species abundance (and/or metabolite profile) at subsequent time instants, a small disagreement in the species abundance (and/or metabolite profile) at an earlier time point is bound to have a trickle down effect on species abundance (and/or metabolite profile) at later time points. Thus, instead of solely relying on the prediction of the previous LSTM unit, teacher forcing frequently replaces the output of the previous LSTM unit with the ground truth species abundance (and/or metabolite profile) as an input to the next unit. This allows the model to effectively do away with the effects of error in prediction by previous LSTM units, and learn to better advance the prediction forward in time. Recall that teacher forcing is only used during training. During the inference or the test phase, only the initial point abundance (and/or metabolite profile) is rolled forward in time using the learned model assisted by teacher forcing.

"(3) The optimization algorithm MATLAB FMINCON is a widely used approach for estimating parameters of the gLV model (see [6], [8] and [10], for example) (4) The proposed method of log transforming the gLV model only works with higher resolution time-series data. We have sparse time sampling in our datasets."

If indeed the data is so sparse that a traditional gLV model would not learn then it would be easy for the authors to show this. In order to be published Figure 4 should compare gLV+NN and LSTM+NN trained on the same data in the same way and compare the model's performance directly.

We appreciate the reviewer’s suggestion to include the apples-to-apples comparison. We implemented a variation of the reviewer’s proposed discretized gLV model, which is described in the Methods section under Comparison of the Discretized gLV Model to the LSTM. We provide two comparisons between gLV+NN and LSTM+NN, where both models were trained on the same data using the same training method to predict species abundance and metabolite concentrations. In the first comparison, we apply both models to predict species abundance after training on in silico data that was generated from a ground truth gLV model. The in silico species abundance data was generated from a ground truth gLV model with randomly sampled growth rates and interaction coefficients according to rj ∼ N(µ = .36,σ = .16), ai,i ∼ N(µ = −1.5,σ = .25), and ai,j ∼ N(µ = −.22,σ = .33). A densely sampled dataset was sampled at one hour intervals over a period of 48 hours, and a sparsely sampled dataset was sampled at 16 hour intervals. For each dataset, a total of 200 runs were simulated, each with a unique initial condition generated using Latin hypercube sampling with values ranging from.001 to.1. 75% of the data was used for training, with the remaining 25% held-out for testing. While the approximate gLV model performs well to predict species abundances after training on densely sampled data, it performs poorly compared to the LSTM model when trained on sparsely sampled data. We present the results from this comparison as a Jupyter notebook, which is available on the manuscript’s GitLab repository.

In the second comparison, we compared a discretized gLV model that was augmented with a feed-forward neural network to enable both species and metabolite predictions so that the model could be directly compared to LSTM model M3. Because the results shown in Figure 4 are based on data that was sampled at two time points, we compared the discretized gLV model to the LSTM for the time-series dataset showin in Figure 5, which was trained on data with observations at four time points to provide a better comparison of each model’s ability to predict system dynamics. In Figure 5—figure supplement 3, we show that the LSTM outperforms the gLV in both species abundance prediction and metabolite concentration prediction. Our results from both comparisons reinforced our main conclusions that the LSTM is capable of achieving improved performance over the discretized gLV model both trained using the same algorithm.

https://doi.org/10.7554/eLife.73870.sa2

Article and author information

Author details

  1. Mayank Baranwal

    1. Department of Systems and Control Engineering, Indian Institute of Technology, Bombay, India
    2. Division of Data & Decision Sciences, Tata Consultancy Services Research, Mumbai, India
    Contribution
    Conceptualization, Data curation, Investigation, Methodology, Software, Validation, Visualization, Writing – original draft, Writing – review and editing
    Contributed equally with
    Ryan L Clark
    Competing interests
    The authors declare no conflicts of interest
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-9354-2826
  2. Ryan L Clark

    Department of Biochemistry, University of Wisconsin-Madison, Madison, United States
    Contribution
    Conceptualization, Funding acquisition, Investigation, Methodology, Software, Validation, Visualization, Writing – original draft, Writing – review and editing
    Contributed equally with
    Mayank Baranwal
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-7865-2496
  3. Jaron Thompson

    Department of Chemical & Biological Engineering, University of Wisconsin-Madison, Madison, United States
    Contribution
    Data curation, Investigation, Methodology, Software, Validation, Visualization, Writing – original draft, Writing – review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-5967-0234
  4. Zeyu Sun

    Department of Electrical Engineering & Computer Science, University of Michigan, Ann Arbor, United States
    Contribution
    Investigation, Software
    Competing interests
    The authors declare no conflicts of interest
  5. Alfred O Hero

    1. Department of Electrical Engineering & Computer Science, University of Michigan, Ann Arbor, United States
    2. Department of Biomedical Engineering, University of Michigan, Ann Arbor, United States
    3. Department of Statistics, University of Michigan, Ann Arbor, United States
    Contribution
    Conceptualization, Funding acquisition, Investigation, Methodology, Resources, Supervision, Visualization, Writing – original draft, Writing – review and editing
    For correspondence
    hero@eecs.umich.edu
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-2531-9670
  6. Ophelia S Venturelli

    1. Department of Biochemistry, University of Wisconsin-Madison, Madison, United States
    2. Department of Chemical & Biological Engineering, University of Wisconsin-Madison, Madison, United States
    3. Department of Bacteriology, University of Wisconsin-Madison, Madison, United States
    Contribution
    Conceptualization, Data curation, Funding acquisition, Investigation, Methodology, Project administration, Resources, Software, Supervision, Validation, Visualization, Writing – original draft, Writing – review and editing
    For correspondence
    venturelli@wisc.edu
    Competing interests
    The authors declare no conflicts of interest
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-2200-1963

Funding

National Institutes of Health (R35GM124774)

  • Ophelia S Venturelli

Army Research Office (W911NF1910269)

  • Alfred O Hero
  • Ophelia S Venturelli

University of Wisconsin-Madison (R01 EB030340)

  • Ophelia S Venturelli

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

This research was supported by funding from the Army Research Office (ARO) under grant number W911NF1910269, the National Institutes of Health under grant numbers R35GM124774 and R01EB030340. RLC was supported in part by an NHGRI training grant to the Genomic Sciences Training Program (T32 HG002760).

Senior Editor

  1. Aleksandra M Walczak, CNRS LPENS, France

Reviewing Editor

  1. Sara Mitri, University of Lausanne, Switzerland

Reviewer

  1. Lingchong You, Duke University, United States

Publication history

  1. Received: September 14, 2021
  2. Preprint posted: September 28, 2021 (view preprint)
  3. Accepted: May 22, 2022
  4. Version of Record published: June 23, 2022 (version 1)

Copyright

© 2022, Baranwal, Clark et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 258
    Page views
  • 56
    Downloads
  • 0
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Mayank Baranwal
  2. Ryan L Clark
  3. Jaron Thompson
  4. Zeyu Sun
  5. Alfred O Hero
  6. Ophelia S Venturelli
(2022)
Recurrent neural networks enable design of multifunctional synthetic human gut microbiome dynamics
eLife 11:e73870.
https://doi.org/10.7554/eLife.73870

Further reading

    1. Computational and Systems Biology
    2. Genetics and Genomics
    Vahid Akbari et al.
    Research Article

    Imprinting is a critical part of normal embryonic development in mammals, controlled by defined parent-of-origin (PofO) differentially methylated regions (DMRs) known as imprinting control regions. Direct nanopore sequencing of DNA provides a means to detect allelic methylation and to overcome the drawbacks of methylation array and short-read technologies. Here, we used publicly available nanopore sequencing data for 12 standard B-lymphocyte cell lines to acquire the genome-wide mapping of imprinted intervals in humans. Using the sequencing data, we were able to phase 95% of the human methylome and detect 94% of the previously well-characterized, imprinted DMRs. In addition, we found 42 novel imprinted DMRs (16 germline and 26 somatic), which were confirmed using whole-genome bisulfite sequencing (WGBS) data. Analysis of WGBS data in mouse (Mus musculus), rhesus monkey (Macaca mulatta), and chimpanzee (Pan troglodytes) suggested that 17 of these imprinted DMRs are conserved. Some of the novel imprinted intervals are within or close to imprinted genes without a known DMR. We also detected subtle parental methylation bias, spanning several kilobases at seven known imprinted clusters. At these blocks, hypermethylation occurs at the gene body of expressed allele(s) with mutually exclusive H3K36me3 and H3K27me3 allelic histone marks. These results expand upon our current knowledge of imprinting and the potential of nanopore sequencing to identify imprinting regions using only parent-offspring trios, as opposed to the large multi-generational pedigrees that have previously been required.

    1. Computational and Systems Biology
    2. Neuroscience
    Maryam H Mofrad et al.
    Tools and Resources

    Sleep is generally considered to be a state of large-scale synchrony across thalamus and neocortex; however, recent work has challenged this idea by reporting isolated sleep rhythms such as slow oscillations and spindles. What is the spatial scale of sleep rhythms? To answer this question, we adapted deep learning algorithms initially developed for detecting earthquakes and gravitational waves in high-noise settings for analysis of neural recordings in sleep. We then studied sleep spindles in non-human primate electrocorticography (ECoG), human electroencephalogram (EEG), and clinical intracranial electroencephalogram (iEEG) recordings in the human. Within each recording type, we find widespread spindles occur much more frequently than previously reported. We then analyzed the spatiotemporal patterns of these large-scale, multi-area spindles and, in the EEG recordings, how spindle patterns change following a visual memory task. Our results reveal a potential role for widespread, multi-area spindles in consolidation of memories in networks widely distributed across primate cortex.