Recurrent neural networks enable design of multifunctional synthetic human gut microbiome dynamics

  1. Mayank Baranwal
  2. Ryan L Clark
  3. Jaron Thompson
  4. Zeyu Sun
  5. Alfred O Hero  Is a corresponding author
  6. Ophelia S Venturelli  Is a corresponding author
  1. Department of Systems and Control Engineering, Indian Institute of Technology, India
  2. Division of Data & Decision Sciences, Tata Consultancy Services Research, India
  3. Department of Biochemistry, University of Wisconsin-Madison, United States
  4. Department of Chemical & Biological Engineering, University of Wisconsin-Madison, United States
  5. Department of Electrical Engineering & Computer Science, University of Michigan, United States
  6. Department of Biomedical Engineering, University of Michigan, United States
  7. Department of Statistics, University of Michigan, United States
  8. Department of Bacteriology, University of Wisconsin-Madison, United States
5 figures, 1 table and 2 additional files

Figures

Comparison of generalized Lotka Volterra (gLV) and Long Short Term Memory (LSTM) model prediction performance of species abundance in a 25-member microbial community in response to third-order perturbations of varying magnitude.

For both models, training data consists of low species richness communities (>0.05 species, N=82,475, Pearson correlation p-value lt0.0001). (a) & (d): Data was generated using a gLV model that captures monospecies growth and pairwise interactions. Scatter plots of true versus predicted species abundance at t=48hr using the gLV and LSTM models, respectively. X represents a vector of species abundances. (b) & (e) Scatter plot of true versus predicted species abundance of the gLV and LSTM models, respectively when the simulated data is subjected to low magnitude (mild) third-order interactions. (c) & (f) Scatter plot of true versus predicted species abundance of gLV and LSTM models, respectively when the simulated data is further subjected to moderately large third-order interactions. (g) Scatter plot of true versus predicted species abundance for the LSTM model. The training set included a set of higher richness communities (50 each of 11 and 19 member communities). All predictions are forecasted from the species abundance at time 0.

Figure 2 with 1 supplement
The LSTM model can predict the temporal changes in species abundance in a 12-member synthetic human gut community in response to periodic dilution (passaging).

(a) Proposed LSTM modeling methodology for the dynamic prediction of species abundance in a microbial community. The initial abundance information is an input to the first LSTM cell, the output of which is trained to predict abundance at the next time point. Consequently, the predicted abundance becomes an input to another LSTM cell with shared weights to predict the abundance at the subsequent time point. The process is repeated until measurements at all time points are available. X represents a vector of species abundances. Thus, all predictions are forecasted from the abundance at time 0. (b) Scatter plot of measured (true) and predicted species abundance of a 12-member synthetic human gut community at 12 hr (N=876, p-value =2.44e-257). (c) Scatter plot of measured (true) and predicted abundance at 24 hr (p-value =6.51e-257). (d) Scatter plot of measured (true) and predicted abundance at 36 hr (p-value =7.42e-257). (e) Scatter plot of measured (true) and predicted abundance at 48 hr (p-value =1.66e-227). (f) Scatter plot of measured (true) and predicted abundance at 60 hr (p-value =3.39e-227).

Figure 2—figure supplement 1
Prediction of temporal changes of species abundance for a few representative communities by the LSTM network.

(a) Histogram of prediction R2-scores on the test set. The prediction R2-score for each community is determined between the measured (true) and predicted abundances of all species in that community at all time instants. Prediction of individual species abundance in communities that display (b) accurate predictions (11-member community), (c) close to the median (3-member), and (d) poor predictions (four-member). We note the difference in abundance scales for some species such as BH and BO. Despite the order of magnitude difference in the scales, our proposed LSTM network does a great job at predicting species abundance with consistency across all species in (a). This is primarily due to feature standardization during the training and inference of LSTM networks.

Figure 3 with 7 supplements
LSTM-guided design and interpretability of community-level metabolite production profiles (a) Schematic of model-training and design of communities with tailored metabolite outputs.

(b) Heatmap of butyrate and lactate concentrations of all possible communities predicted by the LSTM model M1. Grey points indicate communities chosen via k-means clustering to span metabolite design space. Colored boxes indicate ‘corner’ regions defined by 95th percentile values on each axis with points of the corresponding color indicating designed communities within that ‘corner’. Insets show heat maps of acetate and succinate concentrations for all communities within the corresponding boxes on the main figure. Boxes on the inset indicate ‘corners’ defined by 95th percentile values on each axis with colored points corresponding to the same points indicated on the main plot. (c) Cross-validation accuracy of LSTM model trained and validated on a random 90/10 split of all community observations (model M2), evaluated as Pearson correlation R2 for the correlation of predicted versus measured for each variable (all p-valueslt0.05, N and p-value for each test reported in Supplementary file 1). Dashed line indicates R2=0.5, which is used as a cutoff for including a variable in the subsequent network diagrams. (d) and (e) Network representation of median LIME explanations of the LSTM model M2 from (c) for prediction of each metabolite concentration (d) or species abundance (e) by the presence of each species. Edge widths are proportional to the median LIME explanation across all communities from (b) used to train the model in units of concentration (for (d)) or normalized to the species’ self-impact (for (e)). Only explanations for those variables where the cross-validated predictions had R2>0.5 are shown. Networks were simplified by using lower thresholds for edge width (5 mM for (d), 0.2 for (e)). Red and blue edges indicate positive and negative contributions, respectively.

Figure 3—figure supplement 1
Cross-validation of LSTM model M1 predictions of species abundance and metabolite concentration.

Each plot indicates the comparison of predicted versus measured species abundance (a), butyrate concentration (b), acetate concentration (c), lactate concentration (d), or succinate concentration (e) for cross-validation of model M1 predictions of the validation communities from Clark et al., 2021 (model trained on 110 pairwise communities, 156 communities with 3–5 species, and 124 communities with 11-17 species; cross-validation shown is prediction of a different set of 124 communities with 11–17 species, including 82 communities with all 5 butyrate producers (AC, ER, FP, CC, RI) and 42 communities with the 4 butyrate producers other than AC). Each data point indicates the average of biological replicates of a single community. Black lines indicate linear regressions with slope (m) and R2 indicated in the legends. Dashed blue line indicates.x=y.

Figure 3—figure supplement 2
Predicted total carbon in fermentation products.

Histogram of the model M1 predicted total carbon concentration in butyrate, acetate, lactate, and succinate for all possible communities with >10 species (26,434,916 communities).

Figure 3—figure supplement 3
Prediction and classification statistics for model M1 predictions of designed community sets.

(a) Scatter plot of prediction accuracy (correlation of predicted versus measured) of each variable (25 species abundances, 4 metabolite concentrations) by the LSTM model M1 versus the composite model based on the method from Clark et al., 2021. For metabolites, prediction accuracy is also included where the regression model from the composite model is replaced with a Random Forest Regressor (Triangles) or a Feed Forward Network (Plus Signs). Pearson correlation,R2 p-values and N reported in Supplementary file 1. (b–e) Prediction accuracy of model M1 for the indicated metabolites. Dashed line indicates the linear regression for all data points. Legends indicates the Pearson correlation R2 (including p-values, N=80 for Corner, 100 for Distributed) and RMSE for communities from the ‘corner’ set (red) or ‘distributed’ set (blue) for each variable. Solid black lines indicate.x=y (f) Confusion matrix for classification of the ‘corner’ communities into their specified classes (shown in Figure 3b). Values indicate the fraction of communities from each predicted class whose metabolite concentrations were closest (Euclidean distance) to the centroid of each class (Measured Class). Colored boxes indicate ‘sub-classes’ that fall within the four major classes determined in the lactate and butyrate concentration space as shown in Figure 3b. (g) Scatter plot of misclassification rate between each pair of classes (values from (f), fraction of communities misclassified from one class to the other) versus the Euclidean distance between the centroids of that pair of classes. Black data points indicate pairs of classes that fall within the same major classes defined by the colored boxes in (f) and red data points indicate pairs of classes that do not fall within the same major class.

Figure 3—figure supplement 4
Metabolite production of each species grown in monoculture Bars show the mean net production or consumption of each metabolite for monocultures of each species (bar color indicates species as specified in the legend).

Error bars indicate bootstrapped 95% confidence interval on the mean of between 3 and 22 biological replicates. The dashed lines indicate +/- 10mM and the numbers on the plot indicate the number of species with mean net production of that metabolite outside the +/- 10 mM range.

Figure 3—figure supplement 5
Metabolite-species LIME explanations computed over a 20-fold partitioning of the data set.

Box plots of LIME explanations for the metabolites acetate, butyrate, lactate and succinate. LIME analysis to compute the impact of initial species abundances on metabolite predictions in the 25-member (full) community was performed after training on each subset of data that resulted from a 20-fold partitioning. In each box, the black horizontal line shows the median LIME explanation over the 20 samples, each box encloses the first (Q1) and third (Q3) quartiles, whiskers extend to the farthest data points within the range 1.5*(Q3 - Q1). Data points that exceed this range are considered outliers, which are shown as black circles.

Figure 3—figure supplement 6
Microbe-microbe LIME explanations computed over a 20-fold partitioning of the data set.

Box plots of LIME explanations for each species in the 25-member synthetic human gut community. LIME analysis to compute the impact of initial species abundances on the end-point species abundance predictions in the 25-member (full) community was performed after training on each subset of data that resulted from a 20-fold partitioning. In each box, the black horizontal line shows the median LIME explanation over the 20 samples, each box encloses the first (Q1) and third (Q3) quartiles, and whiskers extend to the farthest data points within the range 1.5*(Q3 - Q1). Data points that exceed this range are considered outliers, which are shown as black circles. LIME explanations from each fold are normalized to the given species self-impact such that the LIME explanation of a species to predict its own abundance is equal to one.

Figure 3—figure supplement 7
Comparison of LIME explanations of LSTM to gLV Parameters.

(a) Scatter plot of LIME explanations of each species impact on each other species in model M2 versus the corresponding interspecies interaction parameter (aij) from the gLV model from Clark et al., 2021. Dashed line indicates the linear regression with the regression parameters shown in the legend (N=600). (b) Heatmap representation of agreement/disagreement between specific interactions for the same comparison as in (a). Legend describes what each color represents.

Figure 4 with 1 supplement
Hold-out prediction performance on sub-communities provides information about poorly understood species and interactions between species.

(a) Sensitivity of metabolite prediction performance (R2) to the amount of training data. Training datasets were randomly subsampled 30 times using 50–100% of the total dataset in increments of 10%. Each subsampled training set was subject to 20-fold cross-validation to assess prediction performance. Lineplot of the mean prediction performance over the 30 trials for each percentage of the data. Error bars denote 1 s.d. from the mean. (b) Schematic scatter plot representing how communities containing species A and B define a poorly predicted subsample of the full sample set (c) Heatmap of prediction performance (R2) of acetate for each subset of communities containing a given species (diagonal elements) or pair of species (off-diagonal elements). (d) Heatmap of prediction performance for acetate, butyrate, lactate, and succinate. A sample subset containing a given species or pair of species included all communities in which the species were initially present. Predictions for each community were determined using 20-fold cross validation so that for each model the predicted samples were excluded from the training samples. N and p-values are reported in Supplementary file 1.

Figure 4—figure supplement 1
Sensitivity of species abundance prediction performance (R2) to the size of the training dataset.

Training datasets were randomly subsampled 30 times using 50% to 100% of the total dataset in increments of 10%. Each subsampled training set was subject to 20-fold cross-validation to assess prediction performance. Sub-plots show the mean prediction performance (±one standard deviation) over the 30 trials for each percentage of the dataset. Subplots were sorted according to the variance in species abundance taken over the total dataset. In general, prediction performance of low variance species was less likely to improve in response to more training data. N and p-values are reported in Supplementary file 1.

Figure 5 with 3 supplements
Community metabolite trajectories cluster into qualitatively distinct groups which can be classified based on presence and absence of key microbial species.

(a) Schematic of experiment and network representing a minimal spanning tree across the 95 communities where weights (indicated by edge length) are equal to the Euclidean distance between the metabolite trajectories for each community. Node colors indicate clusters determined as described in the Materials and methods. Red node with black outline annotated with ‘25’ represents the 25-member community. Annotations indicate the most specific microbial species presence/absence rules that describe most data points in the cluster of the corresponding color as determined by a decision tree classifier (Materials and methods). Communities that deviate from the rules for their cluster are indicated with a border matching the color of the closest cluster whose rules they do follow. Network visualization generated using the draw_kamada_kawai function in networkx (v2.1) for Python 3. (b–g) Temporal changes in metabolite concentrations for communities within each cluster (indicated by sub-plot border color), with individual communities denoted by transparent lines. Solid lines and shaded regions represent the mean ±1 s.d. of all communities in the cluster. (h) Schematic of LSTM model training and computation of gradients to evaluate impact of species abundance on metabolite concentrations in a specific community context. (i) Heatmap of model M3 prediction accuracy for four metabolites in the 34 validation communities at each time point (Pearson correlation R2, N=34 for all tests). (j) Heatmap of the gradient analysis of model M3 as described in (h) for the full 25-species community. N and p-values are reported in Supplementary file 1.

Figure 5—figure supplement 1
Characteristics of the dynamic community behaviors.

(a) Minimal spanning tree of a graph representation of the 180 communities characterized in Figure 3 where each node is a community and each weight is the Euclidean distance between a pair of communities in the 4-dimensional metabolite space to show that the subset of communities characterized in the dynamic experiment was representative of all 180 communities characterized in Figure 3. Blue and red nodes indicate the subset of communities chosen for dynamic characterization and used as training and validation examples for LSTM model M3 in Figure 5. These subsets were chosen by first performing kmeans clustering with k=94 for the 180 communities and identifying the 94 communities closest to each cluster centroid and then repeating this process to subsample 34 for the 94 communities (as the training/validation split). (b) and (c) Scatter plots showing where the clusters from Figure 5a fall in the 48 hr metabolite measurement space for comparison with Figure 3b. Each datapoint represents a community with the color corresponding to the clusters in Figure 5a. Legend indicates the percentage of communities from each cluster that come from the ‘corner’ or ‘distributed’ sets. (d) Decision tree classifier explaining which species’ presence determines the clusters of dynamic community behavior from Figure 5. Annotation indicate the percentage of communities from each cluster that can be explained by the indicated paths, which are also annotated on Figure 5a.

Figure 5—figure supplement 2
Prediction accuracy of model M3 for species abundance.

Heatmap represents R2 for the prediction accuracy of model M3 of the abundance of each species at each time point in the 34 validation communities.

Figure 5—figure supplement 3
Comparison of the discrete generalized Lotka-Volterra model to the LSTM using the same training algorithm.

(a) Schematic detailing the implementation of the discretized gLV model and the addition of a feed-forward neural network to predict metabolites from species abundance, where A is a matrix of species interaction coefficients, r is a vector of growth rates, and ⊙ is the Hadamard product. (b) Schematic of the LSTM model, which uses an LSTM cell to compute a hidden state vector, which is the input to a feed-forward neural network that predicts a vector of species abundances and metabolite concentrations at each time step. See Computational Methods for a detailed description. (c) Scatter plot of experimentally measured (true) and predicted species absolute abundance using the approximate gLV model. gLV model prediction performance (N=2,625) of species abundance on held-out test data after training on the same training data used to fit LSTM model M3. (d) Scatter plot of experimentally measured (true) and predicted species absolute abundance using the LSTM + FFN model (N=2,625). (e) Scatter plot of experimentally measured (true) and predicted metabolite concentrations using the gLV + FFN model (N=105 for every metabolite). (f) Scatter plot of experimentally measured (true) and predicted metabolite concentrations using the LSTM +FFN model (N=105 for every metabolite). Lines denote.x=y.

Tables

Key resources table
Reagent type (species) or resourceDesignationSource or referenceIdentifiersAdditional information
Strain, strain background (Prevotella copri CB7)PCDSM 18205
Strain, strain background (Parabacteroides johnsonii M-165)PJDSM 18315
Strain, strain background (Bacteroides vulgatus NCTC 11154)BVATCC 8482
Strain, strain background (Bacteroides fragilis EN-2)BFDSM 2151
Strain, strain background (Bacteroides ovatus NCTC 11153)BOATCC 8483
Strain, strain background (Bacteroides thetaiotaomicron VPI 5482)BTATCC 29148
Strain, strain background (Bacteroides caccae VPI 3452 A)BCATCC 43185
Strain, strain background (Bacteroides cellulosilyticus CRE21)BYDSMZ 14838
Strain, strain background (Bacteroides uniformis VPI 0061)BUDSM 6597
Strain, strain background (Desulfovibrio piger VPI C3-23)DPATCC 29098
Strain, strain background (Bifidobacterium longum subs. infantis S12)BLDSM 20088
Strain, strain background (Bifidobacterium adolescentis E194a (Variant a))BAATCC 15703
Strain, strain background (Bifidobacterium pseudocatenulatum B1279)BPDSM 20438
Strain, strain background (Collinsella aerofaciens VPI 1003)CADSM 3979
Strain, strain background (Eggerthella lenta 1899 B)ELDSM 2243
Strain, strain background (Faecalibacterium prausnitzii A2-165)FPDSM 17677
Strain, strain background (Clostridium hiranonis T0-931)CHDSM 13275
Strain, strain background (Anaerostipes caccae L1-92)ACDSM 14662
Strain, strain background (Blautia hydrogenotrophica S5a33)BHDSM 10507
Strain, strain background (Clostridium asparagiforme N6)CGDSM 15981
Strain, strain background (Eubacterium rectale VPI 0990)ERATCC 33656
Strain, strain background (Roseburia intestinalis L1-82)RIDSM 14610
Strain, strain background (Coprococcus comes VPI CI-38)CCATCC 27758
Strain, strain background (Dorea longicatena 111–35)DLDSMZ 13814
Strain, strain background (Dorea formicigenerans VPI C8-13)DFDSM 3992
Sequence-based reagentForward Primer Index: ATCACGIDTAATGATACGGCGACCACCGAGATCTACAC ATCACG ACACTCTTTCCCTACACGACGCTCTTCCGATCT ACTCCTACGGGAGGCAGCAGT
Sequence-based reagentForward Primer Index: CGATGTIDTAATGATACGGCGACCACCGAGATCTACAC CGATGT ACACTCTTTCCCTACACGACGCTCTTCCGATCT T ACTCCTACGGGAGGCAGCAGT
Sequence-based reagentForward Primer Index: TTAGGCIDTAATGATACGGCGACCACCGAGATCTACAC TTAGGC ACACTCTTTCCCTACACGACGCTCTTCCGATCT GT ACTCCTACGGGAGGCAGCAGT
Sequence-based reagentForward Primer Index: TGACCAIDTAATGATACGGCGACCACCGAGATCTACAC TGACCA ACACTCTTTCCCTACACGACGCTCTTCCGATCT CGA ACTCCTACGGGAGGCAGCAGT
Sequence-based reagentForward Primer Index: ACAGTGIDTAATGATACGGCGACCACCGAGATCTACAC ACAGTG ACACTCTTTCCCTACACGACGCTCTTCCGATCT ATGA ACTCCTACGGGAGGCAGCAGT
Sequence-based reagentForward Primer Index: GCCAATIDTAATGATACGGCGACCACCGAGATCTACAC GCCAAT ACACTCTTTCCCTACACGACGCTCTTCCGATCT TGCGA ACTCCTACGGGAGGCAGCAGT
Sequence-based reagentForward Primer Index: CAGATCIDTAATGATACGGCGACCACCGAGATCTACAC CAGATC ACACTCTTTCCCTACACGACGCTCTTCCGATCT GAGTGG ACTCCTACGGGAGGCAGCAGT
Sequence-based reagentForward Primer Index: ACTTGAIDTAATGATACGGCGACCACCGAGATCTACAC ACTTGA ACACTCTTTCCCTACACGACGCTCTTCCGATCT CCTGGAG ACTCCTACGGGAGGCAGCAGT
Sequence-based reagentForward Primer Index: GATCAGIDTAATGATACGGCGACCACCGAGATCTACAC GATCAG ACACTCTTTCCCTACACGACGCTCTTCCGATCT ACTCCTACGGGAGGCAGCAGT
Sequence-based reagentForward Primer Index: TAGCTTIDTAATGATACGGCGACCACCGAGATCTACAC TAGCTT ACACTCTTTCCCTACACGACGCTCTTCCGATCT T ACTCCTACGGGAGGCAGCAGT
Sequence-based reagentForward Primer Index: GGCTACIDTAATGATACGGCGACCACCGAGATCTACAC GGCTAC ACACTCTTTCCCTACACGACGCTCTTCCGATCT GT ACTCCTACGGGAGGCAGCAGT
Sequence-based reagentForward Primer Index: CTTGTAIDTAATGATACGGCGACCACCGAGATCTACAC CTTGTA ACACTCTTTCCCTACACGACGCTCTTCCGATCT CGA ACTCCTACGGGAGGCAGCAGT
Sequence-based reagentForward Primer Index: AGTCAAIDTAATGATACGGCGACCACCGAGATCTACAC AGTCAA ACACTCTTTCCCTACACGACGCTCTTCCGATCT ATGA ACTCCTACGGGAGGCAGCAGT
Sequence-based reagentForward Primer Index: AGTTCCIDTAATGATACGGCGACCACCGAGATCTACAC AGTTCC ACACTCTTTCCCTACACGACGCTCTTCCGATCT TGCGA ACTCCTACGGGAGGCAGCAGT
Sequence-based reagentForward Primer Index: ATGTCAIDTAATGATACGGCGACCACCGAGATCTACAC ATGTCA ACACTCTTTCCCTACACGACGCTCTTCCGATCT GAGTGG ACTCCTACGGGAGGCAGCAGT
Sequence-based reagentForward Primer Index: CCGTCCIDTAATGATACGGCGACCACCGAGATCTACAC CCGTCC ACACTCTTTCCCTACACGACGCTCTTCCGATCT CCTGGAG ACTCCTACGGGAGGCAGCAGT
Sequence-based reagentForward Primer Index: GTAGAGIDTAATGATACGGCGACCACCGAGATCTACAC GTAGAG ACACTCTTTCCCTACACGACGCTCTTCCGATCT ACTCCTACGGGAGGCAGCAGT
Sequence-based reagentForward Primer Index: GTCCGCIDTAATGATACGGCGACCACCGAGATCTACAC GTCCGC ACACTCTTTCCCTACACGACGCTCTTCCGATCT T ACTCCTACGGGAGGCAGCAGT
Sequence-based reagentForward Primer Index: GTGAAAIDTAATGATACGGCGACCACCGAGATCTACAC GTGAAA ACACTCTTTCCCTACACGACGCTCTTCCGATCT GT ACTCCTACGGGAGGCAGCAGT
Sequence-based reagentForward Primer Index: GTGGCCIDTAATGATACGGCGACCACCGAGATCTACAC GTGGCC ACACTCTTTCCCTACACGACGCTCTTCCGATCT CGA ACTCCTACGGGAGGCAGCAGT
Sequence-based reagentForward Primer Index: GTTTCGIDTAATGATACGGCGACCACCGAGATCTACAC GTTTCG ACACTCTTTCCCTACACGACGCTCTTCCGATCT ATGA ACTCCTACGGGAGGCAGCAGT
Sequence-based reagentForward Primer Index: CGTACGIDTAATGATACGGCGACCACCGAGATCTACAC CGTACG ACACTCTTTCCCTACACGACGCTCTTCCGATCT TGCGA ACTCCTACGGGAGGCAGCAGT
Sequence-based reagentForward Primer Index: GAGTGGIDTAATGATACGGCGACCACCGAGATCTACAC GAGTGG ACACTCTTTCCCTACACGACGCTCTTCCGATCT GAGTGG ACTCCTACGGGAGGCAGCAGT
Sequence-based reagentForward Primer Index: GGTAGCIDTAATGATACGGCGACCACCGAGATCTACAC GGTAGC ACACTCTTTCCCTACACGACGCTCTTCCGATCT CCTGGAG ACTCCTACGGGAGGCAGCAGT
Sequence-based reagentForward Primer Index: ACTGATIDTAATGATACGGCGACCACCGAGATCTACAC ACTGAT ACACTCTTTCCCTACACGACGCTCTTCCGATCT ACTCCTACGGGAGGCAGCAGT
Sequence-based reagentForward Primer Index: ATGAGCIDTAATGATACGGCGACCACCGAGATCTACAC ATGAGC ACACTCTTTCCCTACACGACGCTCTTCCGATCT T ACTCCTACGGGAGGCAGCAGT
Sequence-based reagentForward Primer Index: ATTCCTIDTAATGATACGGCGACCACCGAGATCTACAC ATTCCT ACACTCTTTCCCTACACGACGCTCTTCCGATCT GT ACTCCTACGGGAGGCAGCAGT
Sequence-based reagentForward Primer Index: CAAAAGIDTAATGATACGGCGACCACCGAGATCTACAC CAAAAG ACACTCTTTCCCTACACGACGCTCTTCCGATCT CGA ACTCCTACGGGAGGCAGCAGT
Sequence-based reagentForward Primer Index: CAACTAIDTAATGATACGGCGACCACCGAGATCTACAC CAACTA ACACTCTTTCCCTACACGACGCTCTTCCGATCT ATGA ACTCCTACGGGAGGCAGCAGT
Sequence-based reagentForward Primer Index: CACCGGIDTAATGATACGGCGACCACCGAGATCTACAC CACCGG ACACTCTTTCCCTACACGACGCTCTTCCGATCT TGCGA ACTCCTACGGGAGGCAGCAGT
Sequence-based reagentForward Primer Index: CACGATIDTAATGATACGGCGACCACCGAGATCTACAC CACGAT ACACTCTTTCCCTACACGACGCTCTTCCGATCT GAGTGG ACTCCTACGGGAGGCAGCAGT
Sequence-based reagentForward Primer Index: CACTCAIDTAATGATACGGCGACCACCGAGATCTACAC CACTCA ACACTCTTTCCCTACACGACGCTCTTCCGATCT CCTGGAG ACTCCTACGGGAGGCAGCAGT
Sequence-based reagentForward Primer Index: CAGGCGIDTAATGATACGGCGACCACCGAGATCTACAC CAGGCG ACACTCTTTCCCTACACGACGCTCTTCCGATCT ACTCCTACGGGAGGCAGCAGT
Sequence-based reagentForward Primer Index: CATGGCIDTAATGATACGGCGACCACCGAGATCTACAC CATGGC ACACTCTTTCCCTACACGACGCTCTTCCGATCT T ACTCCTACGGGAGGCAGCAGT
Sequence-based reagentForward Primer Index: CATTTTIDTAATGATACGGCGACCACCGAGATCTACAC CATTTT ACACTCTTTCCCTACACGACGCTCTTCCGATCT GT ACTCCTACGGGAGGCAGCAGT
Sequence-based reagentForward Primer Index: CCAACAIDTAATGATACGGCGACCACCGAGATCTACAC CCAACA ACACTCTTTCCCTACACGACGCTCTTCCGATCT CGA ACTCCTACGGGAGGCAGCAGT
Sequence-based reagentForward Primer Index: CGGAATIDTAATGATACGGCGACCACCGAGATCTACAC CGGAAT ACACTCTTTCCCTACACGACGCTCTTCCGATCT ATGA ACTCCTACGGGAGGCAGCAGT
Sequence-based reagentForward Primer Index: CTAGCTIDTAATGATACGGCGACCACCGAGATCTACAC CTAGCT ACACTCTTTCCCTACACGACGCTCTTCCGATCT TGCGA ACTCCTACGGGAGGCAGCAGT
Sequence-based reagentForward Primer Index: CTATACIDTAATGATACGGCGACCACCGAGATCTACAC CTATAC ACACTCTTTCCCTACACGACGCTCTTCCGATCT GAGTGG ACTCCTACGGGAGGCAGCAGT
Sequence-based reagentForward Primer Index: CTCAGAIDTAATGATACGGCGACCACCGAGATCTACAC CTCAGA ACACTCTTTCCCTACACGACGCTCTTCCGATCT CCTGGAG ACTCCTACGGGAGGCAGCAGT
Sequence-based reagentForward Primer Index: GACGACIDTAATGATACGGCGACCACCGAGATCTACAC GACGAC ACACTCTTTCCCTACACGACGCTCTTCCGATCT ACTCCTACGGGAGGCAGCAGT
Sequence-based reagentForward Primer Index: TAATCGIDTAATGATACGGCGACCACCGAGATCTACAC TAATCG ACACTCTTTCCCTACACGACGCTCTTCCGATCT T ACTCCTACGGGAGGCAGCAGT
Sequence-based reagentForward Primer Index: TACAGCIDTAATGATACGGCGACCACCGAGATCTACAC TACAGC ACACTCTTTCCCTACACGACGCTCTTCCGATCT GT ACTCCTACGGGAGGCAGCAGT
Sequence-based reagentForward Primer Index: TATAATIDTAATGATACGGCGACCACCGAGATCTACAC TATAAT ACACTCTTTCCCTACACGACGCTCTTCCGATCT CGA ACTCCTACGGGAGGCAGCAGT
Sequence-based reagentForward Primer Index: TCATTCIDTAATGATACGGCGACCACCGAGATCTACAC TCATTC ACACTCTTTCCCTACACGACGCTCTTCCGATCT ATGA ACTCCTACGGGAGGCAGCAGT
Sequence-based reagentForward Primer Index: TCCCGAIDTAATGATACGGCGACCACCGAGATCTACAC TCCCGA ACACTCTTTCCCTACACGACGCTCTTCCGATCT TGCGA ACTCCTACGGGAGGCAGCAGT
Sequence-based reagentForward Primer Index: TCGAAGIDTAATGATACGGCGACCACCGAGATCTACAC TCGAAG ACACTCTTTCCCTACACGACGCTCTTCCGATCT GAGTGG ACTCCTACGGGAGGCAGCAGT
Sequence-based reagentForward Primer Index: TCGGCAIDTAATGATACGGCGACCACCGAGATCTACAC TCGGCA ACACTCTTTCCCTACACGACGCTCTTCCGATCT CCTGGAG ACTCCTACGGGAGGCAGCAGT
Sequence-based reagentReverse Primer Index: ATCACGAGIDTCAAGCAGAAGACGGCATACGAGAT ATCACGAG GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT ggactaccagggtatctaatcctgt
Sequence-based reagentReverse Primer Index: CGATGTTCIDTCAAGCAGAAGACGGCATACGAGAT CGATGTTC GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT A ggactaccagggtatctaatcctgt
Sequence-based reagentReverse Primer Index: TTAGGCGAIDTCAAGCAGAAGACGGCATACGAGAT TTAGGCGA GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT TC ggactaccagggtatctaatcctgt
Sequence-based reagentReverse Primer Index: TGACCAATIDTCAAGCAGAAGACGGCATACGAGAT TGACCAAT GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT CTA ggactaccagggtatctaatcctgt
Sequence-based reagentReverse Primer Index: ACAGTGCTIDTCAAGCAGAAGACGGCATACGAGAT ACAGTGCT GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT GATA ggactaccagggtatctaatcctgt
Sequence-based reagentReverse Primer Index: GCCAATGTIDTCAAGCAGAAGACGGCATACGAGAT GCCAATGT GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT ACTCA ggactaccagggtatctaatcctgt
Sequence-based reagentReverse Primer Index: CAGATCGAIDTCAAGCAGAAGACGGCATACGAGAT CAGATCGA GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT TTCTCT ggactaccagggtatctaatcctgt
Sequence-based reagentReverse Primer Index: ACTTGAAAIDTCAAGCAGAAGACGGCATACGAGAT ACTTGAAA GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT CACTTCT ggactaccagggtatctaatcctgt
Sequence-based reagentReverse Primer Index: GATCAGTGIDTCAAGCAGAAGACGGCATACGAGAT GATCAGTG GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT ggactaccagggtatctaatcctgt
Sequence-based reagentReverse Primer Index: TCTACCTCIDTCAAGCAGAAGACGGCATACGAGAT TCTACCTC GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT A ggactaccagggtatctaatcctgt
Sequence-based reagentReverse Primer Index: CTTGTATGIDTCAAGCAGAAGACGGCATACGAGAT CTTGTATG GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT TC ggactaccagggtatctaatcctgt
Sequence-based reagentReverse Primer Index: TAGCTTCCIDTCAAGCAGAAGACGGCATACGAGAT TAGCTTCC GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT CTA ggactaccagggtatctaatcctgt
Sequence-based reagentReverse Primer Index: GGCTACCAIDTCAAGCAGAAGACGGCATACGAGAT GGCTACCA GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT GATA ggactaccagggtatctaatcctgt
Sequence-based reagentReverse Primer Index: ATGCACTTIDTCAAGCAGAAGACGGCATACGAGAT ATGCACTT GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT ACTCA ggactaccagggtatctaatcctgt
Sequence-based reagentReverse Primer Index: GACGGAACIDTCAAGCAGAAGACGGCATACGAGAT GACGGAAC GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT TTCTCT ggactaccagggtatctaatcctgt
Sequence-based reagentReverse Primer Index: AGCCTTGGIDTCAAGCAGAAGACGGCATACGAGAT AGCCTTGG GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT CACTTCT ggactaccagggtatctaatcctgt
Sequence-based reagentReverse Primer Index: CCGTAGAGIDTCAAGCAGAAGACGGCATACGAGAT CCGTAGAG GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT ggactaccagggtatctaatcctgt
Sequence-based reagentReverse Primer Index: GTGAGACTIDTCAAGCAGAAGACGGCATACGAGAT GTGAGACT GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT A ggactaccagggtatctaatcctgt
Sequence-based reagentReverse Primer Index: AATGCTCAIDTCAAGCAGAAGACGGCATACGAGAT AATGCTCA GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT TC ggactaccagggtatctaatcctgt
Sequence-based reagentReverse Primer Index: GCATCGTAIDTCAAGCAGAAGACGGCATACGAGAT GCATCGTA GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT CTA ggactaccagggtatctaatcctgt
Sequence-based reagentReverse Primer Index: CGAACAGCIDTCAAGCAGAAGACGGCATACGAGAT CGAACAGC GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT GATA ggactaccagggtatctaatcctgt
Sequence-based reagentReverse Primer Index: TCGGAAGGIDTCAAGCAGAAGACGGCATACGAGAT TCGGAAGG GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT ACTCA ggactaccagggtatctaatcctgt
Sequence-based reagentReverse Primer Index: TTCTGTCGIDTCAAGCAGAAGACGGCATACGAGAT TTCTGTCG GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT TTCTCT ggactaccagggtatctaatcctgt
Sequence-based reagentReverse Primer Index: GTACTCACIDTCAAGCAGAAGACGGCATACGAGAT GTACTCAC GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT CACTTCT ggactaccagggtatctaatcctgt
Sequence-based reagentReverse Primer Index: AGTAATACIDTCAAGCAGAAGACGGCATACGAGAT AGTAATAC GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT ggactaccagggtatctaatcctgt
Sequence-based reagentReverse Primer Index: CAAGATATIDTCAAGCAGAAGACGGCATACGAGAT CAAGATAT GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT A ggactaccagggtatctaatcctgt
Sequence-based reagentReverse Primer Index: TGTTTGGTIDTCAAGCAGAAGACGGCATACGAGAT TGTTTGGT GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT TC ggactaccagggtatctaatcctgt
Sequence-based reagentReverse Primer Index: CTCCAACCIDTCAAGCAGAAGACGGCATACGAGAT CTCCAACC GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT CTA ggactaccagggtatctaatcctgt
Sequence-based reagentReverse Primer Index: AAATTCTGIDTCAAGCAGAAGACGGCATACGAGAT AAATTCTG GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT GATA ggactaccagggtatctaatcctgt
Sequence-based reagentReverse Primer Index: CCCGCCAAIDTCAAGCAGAAGACGGCATACGAGAT CCCGCCAA GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT ACTCA ggactaccagggtatctaatcctgt
Sequence-based reagentReverse Primer Index: TACAAATAIDTCAAGCAGAAGACGGCATACGAGAT TACAAATA GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT TTCTCT ggactaccagggtatctaatcctgt
Sequence-based reagentReverse Primer Index: GGGCTATAIDTCAAGCAGAAGACGGCATACGAGAT GGGCTATA GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT CACTTCT ggactaccagggtatctaatcctgt
Sequence-based reagentReverse Primer Index: TTTCGGACIDTCAAGCAGAAGACGGCATACGAGAT TTTCGGAC GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT ggactaccagggtatctaatcctgt
Sequence-based reagentReverse Primer Index: TGCGCGTCIDTCAAGCAGAAGACGGCATACGAGAT TGCGCGTC GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT A ggactaccagggtatctaatcctgt
Sequence-based reagentReverse Primer Index: TCCCGCTGIDTCAAGCAGAAGACGGCATACGAGAT TCCCGCTG GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT TC ggactaccagggtatctaatcctgt
Sequence-based reagentReverse Primer Index: GTTTCAGGIDTCAAGCAGAAGACGGCATACGAGAT GTTTCAGG GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT CTA ggactaccagggtatctaatcctgt
Sequence-based reagentReverse Primer Index: GGAGGGGGIDTCAAGCAGAAGACGGCATACGAGAT GGAGGGGG GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT GATA ggactaccagggtatctaatcctgt
Sequence-based reagentReverse Primer Index: GCTGTTAGIDTCAAGCAGAAGACGGCATACGAGAT GCTGTTAG GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT ACTCA ggactaccagggtatctaatcctgt
Sequence-based reagentReverse Primer Index: GAGTGTGAIDTCAAGCAGAAGACGGCATACGAGAT GAGTGTGA GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT TTCTCT ggactaccagggtatctaatcctgt
Sequence-based reagentReverse Primer Index: CGTCCCCGIDTCAAGCAGAAGACGGCATACGAGAT CGTCCCCG GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT CACTTCT ggactaccagggtatctaatcctgt
Sequence-based reagentReverse Primer Index: CCTCATCAIDTCAAGCAGAAGACGGCATACGAGAT CCTCATCA GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT ggactaccagggtatctaatcctgt
Sequence-based reagentReverse Primer Index: CCACGACAIDTCAAGCAGAAGACGGCATACGAGAT CCACGACA GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT A ggactaccagggtatctaatcctgt
Sequence-based reagentReverse Primer Index: CATTGGCTIDTCAAGCAGAAGACGGCATACGAGAT CATTGGCT GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT TC ggactaccagggtatctaatcctgt
Sequence-based reagentReverse Primer Index: AGGGGCCCIDTCAAGCAGAAGACGGCATACGAGAT AGGGGCCC GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT CTA ggactaccagggtatctaatcctgt
Sequence-based reagentReverse Primer Index: ACGACACTIDTCAAGCAGAAGACGGCATACGAGAT ACGACACT GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT GATA ggactaccagggtatctaatcctgt
Sequence-based reagentReverse Primer Index: ACCGACGCIDTCAAGCAGAAGACGGCATACGAGAT ACCGACGC GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT ACTCA ggactaccagggtatctaatcctgt
Sequence-based reagentReverse Primer Index: TATAGTATIDTCAAGCAGAAGACGGCATACGAGAT TATAGTAT GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT TTCTCT ggactaccagggtatctaatcctgt
Sequence-based reagentReverse Primer Index: AACTCAGTIDTCAAGCAGAAGACGGCATACGAGAT AACTCAGT GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT CACTTCT ggactaccagggtatctaatcctgt

Additional files

Transparent reporting form
https://cdn.elifesciences.org/articles/73870/elife-73870-transrepform1-v1.docx
Supplementary file 1

Modeling information.

Contains two tables. Table S1 summarizes the training/test data and hyperparameters for training each LSTM model. Table S2 contains parameters used for all statistical tests presented in this work.

https://cdn.elifesciences.org/articles/73870/elife-73870-supp1-v1.xlsx

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Mayank Baranwal
  2. Ryan L Clark
  3. Jaron Thompson
  4. Zeyu Sun
  5. Alfred O Hero
  6. Ophelia S Venturelli
(2022)
Recurrent neural networks enable design of multifunctional synthetic human gut microbiome dynamics
eLife 11:e73870.
https://doi.org/10.7554/eLife.73870