1. Computational and Systems Biology
  2. Microbiology and Infectious Disease
Download icon

Extensive transmission of microbes along the gastrointestinal tract

  1. Thomas SB Schmidt
  2. Matthew R Hayward
  3. Luis P Coelho
  4. Simone S Li
  5. Paul I Costea
  6. Anita Y Voigt
  7. Jakob Wirbel
  8. Oleksandr M Maistrenko
  9. Renato JC Alves
  10. Emma Bergsten
  11. Carine de Beaufort
  12. Iradj Sobhani
  13. Anna Heintz-Buschart
  14. Shinichi Sunagawa
  15. Georg Zeller
  16. Paul Wilmes
  17. Peer Bork  Is a corresponding author
  1. European Molecular Biology Laboratory, Germany
  2. European Molecular Biology Laboratory and Faculty of Biosciences, Heidelberg University, Germany
  3. APHP and UPEC Université Paris-Est Créteil, France
  4. Luxembourg Centre for Systems Biomedicine, Luxembourg
  5. Centre Hospitalier de Luxembourg, Luxembourg
  6. Max Delbrück Centre for Molecular Medicine, Germany
  7. European Molecular Biology Laboratory and University Hospital Heidelberg, Germany
  8. University of Würzburg, Germany
Research Article
  • Cited 0
  • Views 2,278
  • Annotations
Cite this article as: eLife 2019;8:e42693 doi: 10.7554/eLife.42693

Abstract

The gastrointestinal tract is abundantly colonized by microbes, yet the translocation of oral species to the intestine is considered a rare aberrant event, and a hallmark of disease. By studying salivary and fecal microbial strain populations of 310 species in 470 individuals from five countries, we found that transmission to, and subsequent colonization of, the large intestine by oral microbes is common and extensive among healthy individuals. We found evidence for a vast majority of oral species to be transferable, with increased levels of transmission in colorectal cancer and rheumatoid arthritis patients and, more generally, for species described as opportunistic pathogens. This establishes the oral cavity as an endogenous reservoir for gut microbial strains, and oral-fecal transmission as an important process that shapes the gastrointestinal microbiome in health and disease.

https://doi.org/10.7554/eLife.42693.001

eLife digest

Trillions of bacteria and other microbes live in the human body. The mouth and the gut in particular, are microbial hot spots at either end of the digestive tract. Every day, humans swallow around 1.5 liters of saliva, along with millions of oral microbes. Scientists believe that more than 99% of these microbes die as they pass through the acidic environment of the stomach and later the small intestine, which act as a barrier between the bacteria of the mouth and gut.

Failure of this barrier can lead to overgrowth of oral microbes in the gut. This may contribute to diseases like bowel cancer, rheumatoid arthritis and inflammatory bowel diseases. But even in healthy people, low levels of microbes usually found in the mouth are often found in stool. It is unclear if these microbes cross the barrier or if they are similar microbes that originate in the gut.

Now, Schmidt, Hayward et al. show that in healthy people at least one in three oral microbial cells pass through the digestive tract to settle the gut. This challenges the notion of a mouth-gut barrier. In the experiments, the genetic material of all the microbes in the saliva and stool of several hundred people from three continents was analyzed. This allowed Schmidt, Hayward et al. to determine whether strains found in the gut originate from the mouth, or are closely related but specialized gut types of the same species. The results also showed that patients with bowel cancer and rheumatoid arthritis had more mouth-to-gut microbial transmission than their healthy counterparts.

The experiments suggest that the mouth is a microbial reservoir that constantly replenishes the gut flora. Some of the gut-traveling oral bacteria trigger inflammation when they grow in other parts of the body like the lining of the heart. This, along with the discovery that patients with certain diseases have more oral bacteria in the gut, may suggest that the transmission of these microbes contributes to disease. The experiments also indicate that finding ways to influence oral bacteria might affect the ones in the gut. More studies are needed to understand how mouth microbes survive the trip to the gut and are able to thrive in this competitive environment, and what role they play in health and disease.

https://doi.org/10.7554/eLife.42693.002

Introduction

Both the oral cavity and large intestine accommodate unique microbiomes that are relevant to human health and disease (Lynch and Pedersen, 2016; Wade, 2013). Mouth and gut are linked by a constant flow of ingested food and saliva along the gastrointestinal tract (GIT), yet they host distinct microbial communities (Ding and Schloss, 2014; Segata et al., 2012) in distinct microenvironments (Savage, 1977), and have been reported to harbor locally adapted strains (Lloyd-Price et al., 2017).

The segregation of oral and intestinal communities is thought to be maintained by various mechanisms, such as gastric acidity (Howden and Hunt, 1987; Martinsen et al., 2005) and antimicrobial bile acids in the duodenum (Ridlon et al., 2014). Failure of this oral-gut barrier has been proposed to lead to intestinal infection (Martinsen et al., 2005), and the prolonged usage of proton pump inhibitors can result in an enrichment of particular oral microbes in the gut (Imhann et al., 2016). Increased presence of specific oral taxa in the intestine has in turn been linked to several diseases, including rheumatoid arthritis (Zhang et al., 2015), colorectal cancer (Flynn et al., 2016; Zeller et al., 2014) and inflammatory bowel disease (IBD, (Gevers et al., 2014)). While it remains unclear whether disease-associated strains are indeed acquired endogenously (from the oral cavity) or from the environment, it was recently shown that Klebsiella strains originating from salivary samples of two IBD patients triggered intestinal inflammation in gnotobiotic mice (Atarashi et al., 2017).

This suggests that the presence of oral commensals in the gut is a rare, aberrant event as a consequence of ectopic colonization (i.e., ‘in the wrong place’), and hence a hallmark of disease. Outside a disease context, however, possible links between the oral and gut microbiome remain poorly characterized. Several genera were shown to be prevalent at both sites (Segata et al., 2012), with community types in one being weakly predictive of the other (Ding and Schloss, 2014), and with similar gene content in particular species (Franzosa et al., 2014), but with distinct, locally adapted strains (Lloyd-Price et al., 2017). We hypothesized that this picture is incomplete, and that microbial transmission along the GIT is more common than previously appreciated: that despite oral-gut barrier effects, some microbes freely and frequently traverse the GIT and colonize different niches, forming continuous populations that shape the human microbiome.

Results and discussion

To test this hypothesis, we assembled and analyzed a dataset of 753 public and 182 newly sequenced saliva and stool metagenomes from 470 healthy and diseased individuals (diagnosed with rheumatoid arthritis, colorectal cancer or type-1 diabetes) from Fiji (Brito et al., 2016), China (Zhang et al., 2015), Luxembourg (Heintz-Buschart et al., 2016), France (Zeller et al., 2014), and Germany (Voigt et al., 2015) (see Materials and methods, Figure 1, and Supplementary file 1). For these samples we profiled 310 prevalent species, accounting for 99% of classifiable microbial abundance in both saliva and stool (see Materials and methods and Supplementary file 2). We reasoned that if transmission between the oral and gut microenvironments is frequent, we would expect salivary and fecal microbial populations to be more similar within an individual than between individuals. Conversely, under a strong barrier model with restricted transmission, intra- and inter-individual similarities would be equivalent.

Figure 1 with 1 supplement see all
Data and workflow overview.

(A) Oral-fecal transmission scores were calculated from salivary and fecal microbial SNV profiles. (B) Cohort and dataset overview. For longitudinal cohorts (DE-CTR, CN-RA and LU-T1D), both the total number of samples and the number of individuals are shown, as well as the number of individuals considered in time-series analyses. (C) Salivary and fecal microbial loads allow the calculation of physiologically expected levels of ‘passive’ microbial transmission (i.e., by ingestion, without growth). (D) The longitudinal coupling of microbial SNVs between salivary and fecal samples was used to infer transmission directionality and oral-fecal transmission rates (see Materials and methods).

https://doi.org/10.7554/eLife.42693.003

We found that at species level, community composition was consistent with distinct populations occupying the oral and intestinal microenvironments. By prevalence across subjects, the 310 profiled species fell into three categories (Figure 2A): 44% were predominantly fecal (observed in ≥10% of fecal, but <10% of saliva samples), including core members of the gut microbiome, such as Clostridium sp., Ruminococcus sp. and Bacteroides sp.; 16% of species were predominantly oral. Although the remaining 125 (40%) species were prevalent in ≥10% of saliva and stool samples, their relative abundances differed greatly between the two habitats. The overall oral and fecal microbiome compositions appeared independent of each other (between-subject Bray-Curtis dissimilarities per site, ρPearson=-0.03), and the compositional overlap between mouth and gut of the same subject was not found to be significantly different when compared to a between-subject background (Wilcoxon test, Bray-Curtis dissimilarities, p=0.46).

Figure 2 with 5 supplements see all
Oral-fecal transmission is common across a wide range of phylogenetically diverse species.

(A) Among 310 tested species, 125 were prevalent in both the mouth and gut across subjects. (B) 77% of these formed coherent strain populations between both habitats, when viewed across all tested subjects (‘frequent’ transmitters) or at least in some (‘occasional’ transmitters), as evidenced by oral-fecal transmission scores based on intra-individual SNV overlap against an inter-individual background (see Materials and methods). (C) Oral-to-fecal transmission rates, as inferred from longitudinal coupling of oral and gut SNVs (see Materials and methods), exceeded background levels for transmitted taxa, even at conservative lower estimates. (D) On average, transmissible taxa accounted for a large fraction of classifiable microbial abundance in both the oral cavity and gut. (E) Oral-fecal transmissibility was largely a clade-wise trait at genus or family ranks, but common across bacterial phyla.

https://doi.org/10.7554/eLife.42693.005

However, to accurately establish and quantify microbial transmission, it is necessary to track populations at the resolution of strains rather than species, as demonstrated previously in fecal microbiota transplantation (Li et al., 2016) or seeding of the infant microbiome (Asnicar et al., 2017); Korpela et al., 2018). We therefore profiled microbial single nucleotide variants (SNVs) across metagenomes, as a proxy for strain populations (Li et al., 2016). We formulated a transmission score for each species per subject, based on the likelihood that the observed intra-individual SNV overlap was generated by an inter-individual background model (see Materials and methods). Of the 125 species prevalent in both mouth and gut, 77% showed evidence of oral-fecal transmission. Out of these, 74 species (59%) showed significantly higher intra-individual SNV similarity across all subjects compared to cohort-wide background SNV frequencies (Benjamini-Hochberg-corrected Wilcoxon tests on transmission scores, p<0.05, see Materials and methods; Figure 2B, Figure 2—figure supplement 1, Supplementary file 2). This suggests that they form coherent strain populations along the GIT in most subjects, subject to frequent oral-fecal microbial transmission. Strains of Streptococcus, Veillonella, Actinomyces and Haemophilus, among other core oral taxa, fell into this category. An additional 22 species (18%) showed evidence of at least occasional transmission, with individually significant oral-fecal SNV overlap in some, but not across all subjects, as did 18 species that were generally prevalent in either the mouth or the gut (but not both). All 21 members of the Prevotella genus, an important clade of the gut microbiome, were among these occasionally transmitted species. The remaining 29 (23%) species, which were prevalent in both sites, did not show signs of transmission under the strict thresholds we applied.

The fecal abundance of all species with paired observations exceeded lower-bound physiologically predicted levels (i.e., the detection of salivary bacteria in stool purely as the result of ingestion) by several orders of magnitude, even with conservative estimates (Figure 1C, Figure 1—figure supplement 1). An average person swallows an estimated 1.5 * 1012 oral bacteria per day (Humphrey and Williamson, 2001; Sender et al., 2016). Passage through the stomach reduces the viable bacterial load by 5–6 orders of magnitude (Giannella et al., 1972; Sender et al., 2016), a reduction that is expected to be mirrored at the DNA level, given that free DNA, released from dead bacterial cells, is degraded within seconds to minutes in saliva, the stomach and the intestine (see for example Mercer et al., 1999 and Liu et al., 2015). Relative to the ~3.8*1013 bacterial cells in the large intestine, ‘passive’ transmission without subsequent colonization in the gut would therefore account for a reduction in relative abundance by ~4*10−7 from saliva to feces (Figure 1C). Thus, the observed overlap of microbial SNVs could not be explained by passive translocation, but was indeed caused by active colonization in the gut. Moreover, transmission scores across species and subjects were independent of technical covariates, such as the horizontal or vertical coverage of genome mappings (Figure 2—figure supplement 2). Average transmission scores across subjects did not correlate with prevalence in stool across all taxa (ρSpearman =0.05), whereas an association was evident when considering only transmitters (ρ=0.67). In saliva, prevalence was globally indicative of transmission scores (ρ=0.6), reinforcing the notion that core oral taxa tended to be transmitted. Given the limited microbial read depth of salivary metagenomes (due to high fractions of human DNA), this result also indicates that our estimates of oral-fecal transmissibility were quite conservative, with potentially high rates of false negatives.

It was recently shown that during early life, infants are colonized by maternal strains from both the oral cavity and gut (Ferretti et al., 2018), and that strains from the latter can persist in the infant gut at least into childhood (Korpela et al., 2018). Therefore, to determine whether the observed intra-individual overlap of selected strain populations was due to continuous oral-gut transmission or rare colonization events with subsequent independent expansion in each site, we focused on a subset of 46 individuals for whom longitudinal data was available (with sampling intervals ranging from 1 week to >1 year; mean 79 days). We found that both oral and fecal strain populations were usually stable, even over extended periods of time (Figure 2—figure supplement 3), in line with earlier observations for each individual body site (Lloyd-Price et al., 2017; Schloissnig et al., 2013). Oral and fecal longitudinal SNV patterns were coupled for transmitted species (see Materials and methods): oral SNVs observed at an initial time point were significantly enriched among fecal SNVs that were newly gained over time, but generally not vice versa (Figure 2—figure supplement 4). Moreover, oral-fecal transmission rates (i.e., the fraction of fecal strain turnover attributable to oral strains; see Materials and methods) significantly exceeded background expectation for frequently transmitted taxa (Figure 2C). These findings orthogonally support the oral-gut transmission hypothesis as they strongly suggest that transmission is in the direction of mouth to gut, and not vice versa; and they imply that oral-intestinal transmission is indeed a frequent and continuous process in which oral strain populations constantly re-colonize the gut.

Oral-fecal transmissibility, as a trait, generally aligned with phylogenetic clade boundaries (phylogenetic signal, λPagel=0.76), although transmitting groups were found across bacterial phyla (Figure 2DE, Figure 2—figure supplement 1, Supplementary file 2). Transmission scores were negatively correlated with genome size (ρSpearman=-0.6), indicating that transmitted species generally had smaller genomes than non-transmitted ones. Moreover, oxygen tolerant species (aerobes and facultative anaerobes) showed 7-fold higher scores than anaerobes on average (ANOVA, p=10−16). In contrast, no association was observed for sporulation and motility. To account for possible bias in the species reference and the phylogenetic signal of oral-fecal transmissibility, we confirmed that these signals were robust to phylogenetic regression (Supplementary file 2).

Viewed across individuals, we found that seeding of the gut microbiome from the oral cavity was extensive, with high levels of variation (Figure 3A). On average, potentially transmissible species (i.e., frequent and occasional transmitters) accounted for 75% of classifiable microbes in saliva, ranging up to 99% in some subjects. However, not all of these were detectable in the matched fecal samples, and oral-fecal strain overlap was generally incomplete. We therefore quantified the fraction of realized transmission based on paired observations of species and intra-individual SNV overlap (see Materials and methods). With these criteria, on average 35% of classifiable salivary microbes were transmitted strains that could be traced from mouth to gut within subjects. Similarly, on average 45% (range 2–95%) of classifiable fecal microbes were potential transmitters. These included common fecal species (e.g., Prevotella copri) that were detectable in a subset of salivary samples and showed only occasional transmission. Nevertheless, on average only 2% of classifiable fecal microbes could be confidently ascribed to transmitted strains, ranging to >30% in some subjects.

Figure 3 with 1 supplement see all
Oral-fecal transmission is extensive, with high levels of variation across individuals.

(A) Potentially transmissible species on average accounted for 75% and 45% of known microbes in salivary and fecal samples, respectively. Among these, realised transmitters were defined as strains that could be traced within subjects with confidence (given detection limits, see Materials and methods). (B) Tests for the association of transmission levels in mouth and gut to subject-level covariates (ANOVA, relative sum of squares), to each other (ρSpearman), with oral and fecal community richness (ρSpearman), and with oral and fecal community composition (distance-based redundancy analysis on Bray-Curtis dissimilarities, blocked by cohort, relative sum of squares).

https://doi.org/10.7554/eLife.42693.011

Between-subject variation in the relative abundance of transmitted oral and fecal microbes was found to be independent of subject sex, age and body mass index, although moderate differences were observed between study cohorts (ANOVA, p=0.002; Figure 3B; Supplementary file 3). Levels of transmitted microbial abundance in mouth and gut were found to correlate with each other (ρSpearman=0.48) and with fecal species richness, but salivary transmitted abundance negatively correlated with oral species richness. This is in line with the observation that core oral species are transmissible, with higher richness implying the increased presence of non-transmitted taxa. Conversely, transmission would add species to a mostly non-transmissible core community in the gut.

Although there was no overall association to community composition, levels of transmission correlated with oral or fecal abundances of individual genera (Supplementary file 3). To test whether specific oral and gut microbiome features were predictive of transmission, we categorized individuals based on total transmitted abundance in saliva and stool as ‘high’ or ‘low’ transmission individuals (Materials and methods). We found that models based on salivary species abundances were mildly predictive of both oral (AUC = 0.738) and fecal (AUC = 0.642) transmission levels (Supplementary file 4, Figure 3—figure supplement 1). Gut species models, in contrast, were very strong predictors of transmission in both mouth (AUC = 0.951) and gut (AUC = 0.971). This signal was largely driven by the enrichment of transmitting species in stool (Supplementary file 4), but surprisingly robust to an elimination of all detected transmitters from the model (AUC = 0.835 for the stool transmission group), again implying that the true extent of oral-intestinal transmission may indeed exceed our conservative estimates. Fusobacterium nucleatum subsp. animalis and nucleatum stood out among non-trivial gut markers enriched in high-transmission individuals, in line with existing hypotheses that Fusobacterium nucleatum subspecies may enable synergistic colonization of oral bacteria in the gut, in association with certain diseases (see for example Flynn et al., 2016).

In general, the fecal enrichment of specific oral microbes has repeatedly been associated with various diseases (Zeller et al., 2014; Zhang et al., 2015). However, due to insufficient taxonomic resolution, oral provenance has so far remained impossible to distinguish from an influx of closely related but distinct strains from the environment. We therefore defined a list of disease states with putative links to oral-fecal transmission and annotated known associations in the literature to all species in our dataset (Figure 4A; Supplementary file 2). Transmission scores were significantly increased for known opportunistic pathogens (ANOVA, p=0.016), causative agents of dental caries (p=10−9), and plaque-dwelling bacteria (p=0.002). Likewise, species associated with periodontitis showed increased evidence for transmission (p=0.002), though this signal was mostly due to mildly periodontic species, while core drivers, such as Tannerella forsythia, Treponema denticola and Porphyromonas gingivalis (Socransky et al., 1998), showed little or no indication of oral-fecal transmission. Endocarditis-associated species showed significantly increased transmission scores upon phylogenetic regression (p=0.007), mostly driven by Haemophilus, Aggregatibacter and viridans Streptococci. This overall elevated transmissibility of taxa known to colonize ectopically in various habitats across the body (i.e., opportunistic pathogens), in particular via the bloodstream and associated with inflammation (i.e., endocarditis- or periodontitis-associated species (Hajishengallis, 2015)), may provide first cues to possible mechanisms of oral-fecal transmission.

Figure 4 with 1 supplement see all
Oral-fecal transmission is associated with disease state.

(A) Species known to be associated with various diseases showed increased oral-fecal transmission scores (pANOVA, sequential ANOVA including additional phenotypes), even upon phylogenetic generalized least squares regression (pPGLS, see Materials and methods and Supplementary file 2). (B) Oral-fecal transmission scores tested in colorectal cancer and rheumatoid arthritis cases against controls for specific sets of species (sequential ANOVA, blocked by taxon and subject covariates). Individual data points represent Cohen’s d effect sizes (difference in means, normalised by pooled standard deviation) for individual taxa across subjects.

https://doi.org/10.7554/eLife.42693.013

Our dataset included metagenomes from case-control studies for rheumatoid arthritis (RA, (Zhang et al., 2015)), colorectal cancer (CRC, (Zeller et al., 2014)) and type-1 diabetes (T1D, (Heintz-Buschart et al., 2016)), totaling 299 individuals, including 172 with salivary and fecal samples. Treatment-naïve CRC patients, sampled before colonoscopy, showed increased transmission scores across all taxa (average per-taxon Cohen’s d = 0.27; ANOVA p=10−23; Figure 4B), as well as for transmitted taxa only (d = 0.23; p=10−10). The effect was even more pronounced for species previously described (Zeller et al., 2014) to be enriched in the feces of CRC patients (d = 0.33; p=10−4; Figure 4—figure supplement 1), including Fusobacterium nucleatum spp., Parvimonas micra and Peptostreptococcus stomatis. These findings are in line with a recent report that the oral and fecal microbiome are linked in the context of CRC (Felmer et al., 2018), and support the hypothesis (Flynn et al., 2016) that CRC-associated species are sourced intra-individually from the oral cavity.

Treatment-naïve RA patients displayed mildly elevated transmissibility across all taxa (d = 0.03, p=0.01) and transmissible taxa only (d = 0.07, p=0.08). Interestingly, species that were orally depleted in RA patients showed markedly increased transmission scores (d = 0.61; p=10−21). In contrast, a trend towards decreased transmission in T1D patients was not statistically significant.

Our results demonstrate that influx of oral strains from phylogenetically diverse microbial taxa into the gut microbiome is extensive in healthy individuals, with a high degree of variation between subjects. We showed that the vast majority of species prevalent in both the oral cavity and gut form connected strain populations along the gastrointestinal tract. Furthermore, by leveraging longitudinal data, we established that transmission from the mouth to the gut is a constant process. Approximately one in three classifiable salivary microbial cells colonize in the gut, accounting for at least 2% of the classifiable microbial abundance in feces. This puts oral-fecal transmission well in the range of other factors that determine human gut microbiome composition (Schmidt et al., 2018). Moreover, we note that by using saliva and feces as metagenomic readouts, we may underestimate colonization by oral microbes of the mucosa, given that fecal microbiome composition is not fully representative of the gastrointestinal tract (see for example Zmora et al., 2018). Therefore, and considering that our estimates of both the number of transmissible species and of the fraction of transmissible microbial abundance are conservative lower bounds due to strict thresholding and current detection limits of metagenomic sequencing, we posit that true levels of transmission are likely even higher, and that virtually all known oral species can translocate to the intestine at least under some circumstances.

Finally, we found increased transmission linked to some diseases, and showed for colorectal cancer and rheumatoid arthritis that disease-associated strains of several species enriched in the intestine are indeed sourced endogenously, that is from the patient’s oral cavity, and not from the environment. These results may extend to other diseases beyond those tested here, calling for revised models of microbiome-disease associations that consider the gastrointestinal microbiome as a whole rather than a sum of parts, with important implications for disease prevention, diagnosis, and (microbiome-modulating or -modulated) therapy.

While our findings are observational and do not reveal oral-intestinal transmission routes or mechanistic insights, they challenge current ecological and physiological models of the gastrointestinal tract that assume the oral cavity and large intestine to harbour mostly independent and segregated microbial communities. Instead, most strain populations appear to be continuous along the gastrointestinal tract, originating from the oral cavity, an underappreciated reservoir for the gut microbiome in health and disease.

Materials and methods

Metagenomic datasets

Publicly available raw sequence data was downloaded from the European Nucleotide Archive (ENA) for the FJ-CTR (FijiCOMP, project accession PRJNA217052) (Brito et al., 2016) and CN-RA (PRJEB6997) (Zhang et al., 2015) cohorts. Sample metadata was parsed from ENA and the respective study publications.

For the LU-T1D (PRJNA289586) (Heintz-Buschart et al., 2016) cohort, newly generated salivary and fecal metagenomes were added under the existing project accession. For the FR-CRC (ERP005534) (Zeller et al., 2014) and DE-CTR (ERP009422) (Voigt et al., 2015) cohorts, newly generated metagenomes were uploaded under project accession PRJEB28422 (samples ERS2692266-ERS2692323).

Sample collection

German healthy controls (DE-CTR). Salivary samples were collected at home before dental hygiene and breakfast in the early morning. Donors collected 2–3 ml of saliva and immediately mixed with 15 ml of RNAlater (Sigma-Aldrich). Samples were transported to the laboratory on ice or dry ice and stored at −80C until further processing.

French colorectal cancer cohort (FR-CRC). Subject recruitment and cohort characteristics were described previously (Zeller et al., 2014). Saliva samples were collected in 1.5 ml saline and stored at −80C until further processing.

Luxembourg type-1 diabetes cohort (LU-T1D). Donors collected 2–3 ml of saliva at home before dental hygiene and breakfast in the early morning. Samples were immediately frozen on dry ice, transported to the laboratory and stored at −80C until further processing.

DNA extraction

DE-CTR and FR-CRC. After thawing on ice, 1–2 ml of each sample were centrifuged directly (FR-CRC) or after dilution in RNALater (DE-CTR). Cell pellets were washed 3x in sterile Dulbecco’s PBS (PAA Laboratories) and DNA was extracted using the using the GNOME DNA Isolation Kit (MP Biomedicals). Briefly, cell pellets were lysed using a multi-step process of chemical cell lysis/denaturation, bead-beating and enzymatic digestion as described previously (Zeller et al., 2014). DE-CTR samples were processed in duplicates, with one replicate being enriched for microbial DNA using the NEBNext Microbiome DNA Enrichment Kit (NEB, Ipswich, USA) following the manufacturer’s instructions.

LU-T1D. After thawing on ice, two 500 µl aliquots of each sample were centrifuged. Cell pellets were frozen in liquid nitrogen and lysed by cryo-milling and chemical lysis in RLT buffer (QIAGEN). Cell debris was passed through QiaShredder columns (QIAGEN), before DNA was isolated using the QIAGEN AllPrep kit according to the manufacturer’s instructions, as described previously (Heintz-Buschart et al., 2016).

Metagenomic sequencing

Libraries for salivary samples of the French and German cohorts were prepared using the NEBNext Ultra DNA Library Prep kit (New England Biolabs, Ipswich) using a dual barcoding system, and sequenced at 125 bp paired-end on an Illumina HiSeq 2000. For the additional LU-T1D samples, libraries were likewise prepared using a dual barcoding system, and sequenced at 150 bp paired-end on Illumina HiSeq 4000 and Illumina NextSeq 500 machines.

Metagenomic sequence processing

Raw reads were quality trimmed and filtered against the human genome issue 19 to exclude host sequences using MOCAT2, as described previously (Kultima et al., 2016). For taxonomic profiling, reads were mapped against a database of 10 universal marker genes for 1753 species-level genome clusters (specI clusters, (Mende et al., 2013)), using NGless (Coelho et al., 2018). A maximum likelihood-approximate phylogenetic tree (with the JTT model, (Jones et al., 1992)) for representative genomes of the same 1753 clusters was inferred based on protein sequences of 40 near-universal marker genes (Mende et al., 2013) using the ETE3 toolkit (Huerta-Cepas et al., 2016), with default parameters for ClustalOmega (Sievers et al., 2011) and FastTree2 (Price et al., 2010).

Metagenomic reads were mapped at 97% sequence identity (across at least 45nt) against full cluster-representative genomes, using the Burrows-Wheeler Aligner (Li and Durbin, 2009), as implemented in NGless. Reads mapping to multiple genomes at ≥97% identity were discarded from the analysis. Average vertical coverage (sequencing depth) and horizontal coverage (breadth) per microbial genome in each sample were quantified using the qaCompute utility in metaSNV (Costea et al., 2017).

Two cohorts (CN-RA (Zhang et al., 2015) and DE-CTR (Voigt et al., 2015)) contained technical replicates for several salivary samples; these were pooled after the read mapping step.

Taxa filtering and annotation

The dataset was filtered to include taxa satisfying the following criteria in ≥10% of samples (see Figure 2—figure supplement 5 for details): horizontal coverage (breadth) of ≥0.05; average vertical coverage (depth) ≥0.25; specI cluster relative abundance of ≥10−6. These criteria excluded taxa representing 0.8 ± 1.2% of gut and 1.2 ± 1.9% of oral total mapped abundance. For the remaining 310 taxa, general phenotypes (Gram stain, sporulation, motility, oxygen requirement, among others) were annotated using the PATRIC database (accessed Dec 2015) (Wattam et al., 2017), and missing values were amended manually. Host and disease association phenotypes (including opportunistic pathogenicity and periodontitis association) were annotated manually, based on published literature and the MicrobeWiki website (https://microbewiki.kenyon.edu/index.php/MicrobeWiki, accessed June 2017).

Per taxon summary statistics and annotated metadata are available from Supplementary file 2.

Identification of microbial Single Nucleotide Variants

Microbial Single Nucleotide Variants (SNVs) were called using metaSNV (Costea et al., 2017). Each potential SNV required support by at least two non-reference sequencing reads (relative to the specI cluster representative genomes (Mende et al., 2013)) at a base call quality of Phred ≥ 20. The resulting sets of raw SNVs per taxon were filtered differentially for the various downstream analyses, as detailed below.

Detection of Intra-Individual microbial transmission

To distinguish intra-individual microbial transmission from random drift, we calculated a transmission score (ST) per subject and microbial taxon. In short, ST quantifies how much the similarity between oral and gut SNV profiles within an individual deviates from an inter-individual background. To calculate ST, we first filtered the set of informative SNVs (all SNVs at a given genome position) by applying the following criteria: (i) observation (read coverage ≥1) at focal position in ≥10 oral and ≥10 gut samples; (ii) SNV observation in ≥1 oral and ≥1 gut sample. Next, we calculated the global background incidence of each allele across oral (foral) and gut (fgut) samples. From these, we calculated the background probabilities for each of the four possible cases in paired oral and gut observations: any given allele i could either be present in both samples (p1,1), absent in both samples (p0,0), or present in one but absent in the other sample (p1,0 and p0,1):

p1,1(i)=foral(i)fgut(i)
p0,1(i)=(1foral(i))(1fgut(i))
p1,0(i)=foral(i)(1fgut(i))
p0,1(i)=(1foral(i))fgut(i)

For every permuted oral-gut pair of samples, we then calculated the raw summed log-likelihood of the observed SNV profile overlap (Lobs) across all alleles with shared coverage:

Lobs=(i1,1log(p1,1(i))+j0,0log(p0,0(j)))-(k1,0log(p1,0(k))+l0,1log(p0,1(l)))

In other words, Lobs quantifies how likely the observed average allele profile agreement between two samples is, given the respective background allele incidence frequencies. Similarly, we computed the log-likelihood of the least likely agreement case (Lmin) per allele:

Lmin=imin(log(p1,1(i)),log(p0,0(i)))

From these values, we calculated a raw probability score (Praw) for the observed allele agreement between a given pair of oral and gut samples:

praw=Lobs/Lmin

Praw scales the likelihood of the observed agreement by the likelihood of the theoretically most extreme cases of agreement across all observed alleles. In particular, the shared observation of very rare alleles (very low foral and fgut) has a strong impact on Praw, whereas the shared observation of very common variants is downweighted.

We computed Praw for all pairwise permutations of oral and gut samples in the dataset with observations (reads) at ≥20 matching positions. We defined the transmission score ST(t, s) for taxon t in subject s as a standard Z score of the intra-individual (within subject) observation against an inter-individual (between subjects) background:

ST=(Praw(s)μraw)/σraw

We tested for potential effects of the choice of background observations by calculating ST against (i) a global background of all pairwise inter-individual oral-gut comparisons, across all cohorts; (ii) a cohort-specific background per subject; (iii) a global background, but taking only subject-specific comparisons into account (the focal subject’s oral sample vs all gut samples, and vice versa); (iv) a within-cohort subject-specific background. Oral-gut comparisons for the same individual across different timepoints, within families (information available for LU and CN cohorts) and within village (for the Fijian cohort) were excluded from the background sets. Although smaller background sets (iii and iv) provided generally noisier scores, overall trends between these backgrounds were very consistent; in particular, cohort-specific vs global backgrounds did not impact trends in our findings (data not shown). All results discussed in the main text therefore refer to scores against a cohort-specific background (ii).

Quantification of Intra-Individual microbial transmission

To quantify oral-gut transmission per individual, we defined a set of potentially transmissible species to include both frequently and occasionally transmitting species. Frequent transmitters encompassed a set of 74 species for which intra-individual transmission scores ST across subjects were significantly higher than inter-individual background (Benjamini-Hochberg-adjusted one-sided Wilcoxon p<0.05). Occasional transmitters did not satisfy this global criterion, but showed significant evidence for oral-fecal strain overlap in at least one individual (Benjamini-Hochberg-adjusted Z test p<0.05).

To quantify the transmitted microbial abundance per individual, we adjusted the observed relative oral and fecal abundance of each given species by oral-fecal SNV overlap. In other words, the potentially transmissible abundance in the oral cavity was defined as the total abundance of potentially transmitting species, and the realized transmitted abundance was defined to include only species for which overlapping strain populations could be confidently traced within individuals. This included frequent transmitters that were observable (above detection limits) in matched oral-fecal sample pairs, and occasional transmitters satisfying the additional criterion that significant transmission scores were required in the focal individual for (i.e., an occasional transmitter such as Prevotella denticola would only be considered in individuals in which it showed significant transmission scores). For these species, relative oral and fecal abundances were adjusted for total strain population overlap, estimated as the Jaccard overlap of SNVs observed in the oral cavity and gut of the focal individual.

Longitudinal coupling of oral and fecal SNV profiles

Longitudinal data (2–3 timepoints, see Supplementary file 1) was available for 46 individuals from three cohorts (Heintz-Buschart et al., 2016; Voigt et al., 2015; Zhang et al., 2015). To quantify site-specific temporal stability of strain populations, we contrasted within-subject SNV profile similarity over time to between-subject similarities.

Moreover, we tested the longitudinal coupling of strain populations between a putative source site (e.g., oral cavity) and sink site (e.g., gut). For this, we required shared observations (read coverage ≥1) for at least 100 SNV positions across three samples (see Figure 1): (i) source site at the initial time point (t0); (ii) sink site at t0; (iii) sink site at a later time point t1. We defined source SNVs as present in sample (i), and newly gained sink SNVs as present in sample (iii) but not (ii), and performed Fisher’s exact tests (followed by Benjamini-Hochberg correction) to test for associations between these SNV sets. In other words, we tested for the association of strain populations present in the source site at t0 with strains newly gained in the sink site over time, by proxy of SNV profiles. We considered two sites to be longitudinally coupled in the source - > sink direction if the tested odds ratio was >1 at a (corrected) p≤0.05. Significant odds ratios < 1 indicated unconnected sites in the tested directionality. Tests were performed independently for oral-to-gut (oral as source, gut as sink) and gut-to-oral coupling, per each taxon.

Quantification of Oral-Fecal transmission rates

Longitudinal data was also leveraged to estimate oral-fecal transmission rates, here defined as the fraction of fecal strain turnover attributable to the corresponding salivary sample. For each subject and taxon, the absolute fecal strain turnover was quantified as described above, as the difference in SNV profiles between fecal samples at t0 and t1 (samples ii and iii in the previous section). Though sampling intervals ranged from 1 week to >1 year, they were relatively consistent within cohorts (see Supplementary file 1). Transmission rates were then quantified as the fraction of fecal alleles gained between t0 and t1 that were also observed in the paired oral sample at t0. Arguably, this provides a conservative lower estimate: oral-fecal transmission could account for both newly gained fecal alleles and for the enhanced stability of existing alleles in the fecal strain population due to a constantly exerted dispersal pressure. However, since the latter effect cannot reliably be quantified from sparse longitudinal metagenomic data, the transmission rates reported in the main text only encompass the former (newly gained alleles).

To test whether transmission rates per taxon were statistically significant across subjects, we compared observed rates to two distinct randomized backgrounds: by shuffling fecal samples at t1 within cohorts, subject-specific longitudinal background sets on fecal strain turnover were generated; shuffling oral samples at t0 provided subject-specific coupled backgrounds. For each taxon and subject, we Z-transformed observed transmission rates against either of these subject-specific backgrounds; the resulting standard scores (in unit standard deviations) are reported in Figure 2C.

Diversity, Community Composition and Statistical Analyses

Per-sample community richness was calculated from the average of 100 rarefactions to normalised marker gene-based abundances of 1000. Between-sample community compositional similarities were computed as Bray-Curtis and TINA indices, as described previously (Schmidt et al., 2017). Distance-based Redundancy Analyses to associate community composition to levels of oral-fecal transmission were performed using the R package vegan (Oksanen et al., 2015).

The association of transmission scores with taxa phenotypes (oxygen requirement, sporulation, etc.) and taxa disease annotations (opportunistic pathogenicity, etc.) were tested using ANOVA of a combined linear model (‘naïve’ ANOVA in Supplementary file 2). To correct for potentially confounding phylogenetic signals of the tested variables, an ANOVA of a phylogenetically regressed model of the same formulation was performed using the R package caper (Orme et al., 2018).

Associations of total transmitted classifiable abundance in saliva and stool per subject with subject variables (sex, BMI, age) were tested using ANOVAs on linear models blocked by cohort. The association of transmission scores per subject with disease status was tested using ANOVAs per disease cohort, on linear models accounting for taxon baselines, as well as effects of subject sex, BMI and age.

To test for links between microbiome composition and the amount of transmitted abundance in saliva and stool, we trained machine learning models to classify samples into ‘high’ and ‘low’ transmission groups. These groups were defined as the top and bottom quartiles of the fraction of transmitted abundance, independently for stool and saliva samples. For model training, relative abundances were log-transformed and standardized as z-scores. In a 10 times-repeated 10-fold cross-validation setting, L1-regularized (LASSO) logistic regression models (Tibshirani, 1996) were trained on the training set and then evaluated on the test set within each fold. In a second step, all species defined as frequent transmitters (see Quantification of Intra-Individual Microbial Transmission above) were eliminated as features before preprocessing and training. All steps (data preprocessing, model building, and model evaluation) were performed using the SIAMCAT R package (https://bioconductor.org/packages/SIAMCAT, version 1.1.0; see also Zeller et al., 2014).

All statistical analyses were performed in R. Analysis code is available online (see below).

Data and analysis code availability

All generated raw sequence data has been uploaded to the European Nucleotide Archive under the project accessions PRJEB28422 (French CRC, (Zeller et al., 2014) and German German healthy controls, (Voigt et al., 2015)) and PRJNA289586 (Luxembourg T1D, (Heintz-Buschart et al., 2016)). Sample metadata is available from Supplementary file 1. Processed data (taxonomic profiles, taxa annotations, etc.) and full analysis code are available via a gitlab repository (https://git.embl.de/tschmidt/oral-fecal-transmission-public; copy archived at https://github.com/elifesciences-publications/oral-fecal-transmission-public-).

References

  1. 1
  2. 2
  3. 3
  4. 4
    NG-meta-profiler: fast processing of metagenomes using NGLess a domain-specific language
    1. LP Coelho
    2. R Alves
    3. P Monteiro
    4. J Huerta-Cepas
    5. AT Freitas
    6. P Bork
    (2018)
     bioRxiv, 10.1101/367755.
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10
  11. 11
  12. 12
  13. 13
  14. 14
  15. 15
  16. 16
  17. 17
  18. 18
  19. 19
  20. 20
  21. 21
  22. 22
  23. 23
  24. 24
  25. 25
  26. 26
  27. 27
  28. 28
  29. 29
  30. 30
    Fate of free DNA and transformation of the oral bacterium streptococcus gordonii DL1 by plasmid DNA in human saliva
    1. DK Mercer
    2. KP Scott
    3. WA Bruce-Johnson
    4. LA Glover
    5. HJ Flint
    (1999)
    Applied and Environmental Microbiology 65:6–10.
  31. 31
  32. 32
    Caper: comparative analyses of phylogenetics and evolution in R
    1. D Orme
    2. R Freckleton
    3. G Thomas
    4. T Petzoldt
    5. S Fritz
    6. N Isaac
    7. W Pearse
    (2018)
    R Development Core Team.
  33. 33
  34. 34
  35. 35
  36. 36
  37. 37
  38. 38
  39. 39
  40. 40
  41. 41
  42. 42
  43. 43
  44. 44
  45. 45
  46. 46
  47. 47
  48. 48
  49. 49

Decision letter

  1. Wendy S Garrett
    Senior Editor; Harvard TH Chan School of Public Health, United States
  2. Max Nieuwdorp
    Reviewing Editor; AMC, Netherlands
  3. Andrei Prodan
    Reviewer; Amsterdam University Medical Center, Netherlands
  4. Paul O'Toole
    Reviewer

In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.

Thank you for submitting your article "Extensive Transmission of Microbes along the Gastrointestinal Tract" for consideration by eLife. Your article has been reviewed by Wendy Garrett as the Senior Editor, a Reviewing Editor, and three reviewers. The following individuals involved in review of your submission have agreed to reveal their identity: Andrei Prodan (Reviewer #1); Paul O'Toole (Reviewer #3).

The reviewers have discussed the reviews with one another and although the paper is currently now suitable for publication, the Reviewing Editor has drafted this decision to help you prepare a revised submission.

Summary:

This paper by Schmidt et al., is an overall well written manuscript that shows that bacterial strain exchange between the oral and gut environments is more extensive than previously thought. Additionally, the manuscript puts forth that this is a normal occurrence, rather than solely a mark of dysbiosis / disease and that opportunistic pathogens had higher evidence of transmission along the gastrointestinal tract with an extensive exchange of strains between the two using SNV profiles of each bacterial strain with oral cavity bacterial strain dominance.

Essential revisions:

There are a few methodological questions that needs to be addressed. This includes cross checking with the ConStrains method to identify microbial strains (Luo et al., 2015. Also, the amount of sequence coverage per sample (in Gbp) should be specified so that readers have a platform-independent reference point for the coverage that is necessary and sufficient for SNV analysis. Moreover, as reviewer 3 points out the authors should further expand on their estimations of quantities and viability of bacterial cells in the lumen that can be attributed to salivary bacteria. In this regard, the paper could be strengthened if there would be some data on viability of bacterial strains in the intestinal tract as the authors might have underestimated the contribution of passively translocated DNA (belonging to living or dead bacteria) to the faecal microbiota. In this regard, a few years ago, Korem et al., (2015) published about this using shotgun metagenomic sequencing to calculate the ratio of sequencing coverage between the peak and trough providing a quantitative measure of a species' growth rate. It would be of importance to see if this bioinformatic approach would help solve this question. Finally, a dedicated section on statistical analysis in the method description of the paper would be helpful and the numbers of included individuals should be cross checked within the supplemental data and tables (e.g. Figure 1).

Please see the full reviews below for further points:

Reviewer #1:

This is a solid paper which shows that bacterial strain exchange between the oral and gut environments is more extensive than previously thought and a normal occurrence, rather than solely a mark of dysbiosis / disease. Nonetheless, they also show that opportunistic pathogens had higher evidence of transmission. While the authors find no correlation between the β-diversity measures (as determined by metagenomic profiling) of the gut and the oral environment, they show that there is an extensive exchange of strains between the two. This is done by determining the strain profiles (SNV profiles of each strain) and showing that, based on probabilistic models, the overlap in some of these profiles is significantly higher than would be expected by random chance ("transmission scores"). The fact that species transmission scores correlate with oral relative abundance, but not to gut abundance, indicate that (as would be expected) the direction of transmission is from the oral cavity to the gut as does the fact that 'oral SNVs observed at an initial time point were significantly enriched among fecal SNVs that were newly gained over time, but generally not vice versa '. The paper is made stronger by the use of longitudinal data.

The paper is concise and well-written, providing a detailed, clear and understandable description of the methods (e.g. how the transmission scores are calculated and what the rationale is). The visualizations (including the Supplementary figures) are very expressive and informative. Supporting information files have been submitted, and code and data have been made available on a GitHub repository.

I have very few critical remarks to make, I think the paper is rigorous and well-polished:

1) PRJEB28422 accession number does not exist on ENA (checked on Nov 27th). Did the authors mean PRJEB22368?

2) "Transmission scores were negatively correlated with genome size (ρ Spearman=-0.6), indicating that transmitted species generally had smaller genomes than non-transmitted ones" Any idea why this is the case?

3) "the fecal relative abundance of Fusobacterium sp. positively correlated with higher levels of transmission)" Why would this be the case?

Reviewer #2:

In this work, Schmidt et al., provide theoretical evidence for the transmission and colonization of oral microbial genomes in the distal gut. The results are interesting and important because they show that transmission of oral microbes to the gut occurs extensively in healthy individuals also in adulthood. Therefore, if these results are confirmed, oral-gut transmission might be an important factor to consider for the prevention and management of human diseases through the GIT microbiota.

We have a few major concerns that the authors should address to support their findings:

1) How were the cut-offs for vertical and horizontal genome coverage chosen? A 5% breadth of coverage seems low for the identification of microbial strains. Would the choice of these cut-offs affect the results? Indeed, Supplementary figure 4 shows that both vertical and horizontal genome coverage can affect the transmission score at least of some specific taxa. Please highlight the distribution of transmitters in this supplementary figure.

2) The authors base their work on the assumption that oral and gut SNVs profiles of transmitted genomes are more similar in an individual than between individuals. However, does this similarity of SNVs profiles necessarily imply transmission? Or, alternatively, could other individual-specific genetic and/or environmental factors shape a similar oral and gut microbiota in an individual instead of transmission?

3) As the authors use a low breadth of coverage and assume transmission based on similarity of SNVs profiles, we would like to ask that the authors confirm their results when using the ConStrains method to identify microbial strains (Luo et al., 2015). This method is also based on SNPs in oral and gut samples.

Reviewer #3:

This study entitled "Extensive Transmission of Microbes along the Gastrointestinal Tract" is an original work focusing on the transmission of bacterial strains between the oral and gut environment. The study is a robust analysis at strain-level of the oral and gut microbiota composition intra- and inter-individuals. The data are well exploited, especially regarding potential confounding factors between the different cohorts. The authors set out to identify population flow of bacteria from the oral cavity to the lumen. They defined metagenomes to an SNV-level resolution. The central thesis of this paper is that intra-individual overlap of SNVs between the oral cavity and the lumen is greater than that which would be expected from inter individual background thereby demonstrating oral taxa translocation. Within their model this event of translocation was a persistent one. The taxa identified as transmitted were phylogenetically diverse, yet some clade clustering was noted. The only noted characteristics of these taxa were reduced relative genome size and their anaerobic/ facultative aerobic nature.

This is a novel study with potentially significant ramifications for human intestinal microbial ecology.

My main critiques are these four points:

1) One of the pillar arguments in this paper is that the numbers of bacteria cells in the colon which show evidence of transmission, cannot be attributed to the passive translocation due to peristalsis. They argue that the amount of bacterial cells swallowed by a human per day (1.5*1012) would be depleted during passage through the upper digestive tract (stomach and duodenum) and thus would not contribute significantly to the gut microbiota. However, methodologically, the authors are investigating DNA not viable living cells. One might argue the authors have underestimated the contribution of passively translocated DNA (belonging to living or dead bacteria) to the faecal microbiota. The stomach and the mouth contain a high proportion of dead cells (perhaps only 1% of stomach bacteria cells are alive) whose DNA could translocate to the lumen. Given the estimates of an average person has 1 bowel moments a day (the cohort is older with disease so could be less), the average mass of stool to be 100 grams and the bacterial density in stool is estimated to be 0.9·1011 bacteria/g; an individual would pass 9x1012 bacterial cells a day. Presuming that an individual passes all the saliva they swallow a day in their bowel movement and there is no loss of bacterial DNA; One would detect ~1.5*1012 per stool. This would be above the 10% limit that the authors set. I think the authors should further expand on their estimations of quantities of bacterial cells in the lumen that can be attributed to salivary bacteria. The literature to which they reference in not primary work in essence and review. More solid numbers are needed when discussing the oral bacterial density and volume. Indeed, the focus of the Sender et al., 2016 paper is the colonic microbiota. I think their strong claims need stronger support. Likewise, the results presented in the study are not enough to support the following statement in particular "Approximately one in three salivary microbial cells colonise in the gut, accounting for at least 2% of the classifiable microbial abundance in feces".

2) The number of individuals reported in the text does not match the cohort and dataset overview presented in Figure 1. In the abstract and the main text, the authors reported the analysis of 470 healthy and diseased individuals but based on Figure 1 all together the cohorts comprised 571 individuals including 365 intra-individual couples. Further, the authors reported in the main text they they focused on a subset of 57 individuals for whom longitudinal data was available but based on Figure 1 only 46 individuals (including diseased individuals) presented time series. Then, for the case-control studies, the authors reported in the main text a total of 172 individuals but based on Figure 1 the cohorts CN-RA, FR-CRC and LU-T1D comprised 395 individuals including 219 intra-individual couples (healthy and diseased individuals). As a general comment, the authors should also clarify precisely in the main text whether the studied individuals are intra-individual couples (with both saliva and stool samples) or individuals with one sample type.

3) The authors profiled 310 prevalent species, which accounted for 99% of classifiable microbial abundance in both saliva and stool. However, there is no mention of the unclassifiable fraction of the reads, the proportion of classified over non-classified reads or the percentage of mapped reads.

4) The authors should acknowledge the difference between colonization of the lumen (faecal matter) versus colonization of the mucosa. Recent work by Zmora et al., 2018 on probiotics have highlighted the disparity between the colonization of the faecal matter versus the mucosa. I recognize they prepared the current submission before the Zmora paper came out. However, there are many other papers that make this point, at least with respect to faecal versus mucosa.

[Editors' note: further revisions were requested prior to acceptance, as described below.]

Thank you for sending your article entitled "Extensive Transmission of Microbes along the Gastrointestinal Tract" for peer review at eLife. Your article is being evaluated by three peer reviewers, and the evaluation is being overseen by a Reviewing Editor and Wendy Garrett as the Senior Editor.

Reviewer #2:

We thank the authors for their work while revising their manuscript. The manuscript reads well, and we have only one additional comment concerning the correlations between average transmission scores and other parameters provided in the supplementary table.

We could reproduce the results (Results and Discussion section) for the correlations between transmission scores and prevalence_saliva (rho=0.6) as well as prevalence_gut (rho=0.05). However, there was a stronger correlation between average transmission score and prevalence_gut when accessing only the transmitters.

Additionally, strong correlations were observed for average transmission score and P/horizontal coverage for transmitters.

Can the authors comment?

https://doi.org/10.7554/eLife.42693.036

Author response

Summary:

This paper by Schmidt et al., is an overall well written manuscript that shows that bacterial strain exchange between the oral and gut environments is more extensive than previously thought. Additionally, the manuscript puts forth that this is a normal occurrence, rather than solely a mark of dysbiosis / disease and that opportunistic pathogens had higher evidence of transmission along the gastrointestinal tract with an extensive exchange of strains between the two using SNV profiles of each bacterial strain with oral cavity bacterial strain dominance.

We thank the editor for this positive feedback, and for soliciting constructive reviews that raised relevant and interesting suggestions to further strengthen our study.

Essential revisions:

There are a few methodological questions that needs to be addressed. This includes cross checking with the ConStrains method to identify microbial strains (Luo et al., 2015.

Following this suggestion, we ran ConStrains on a subset of 144 samples from three cohorts of our dataset. Although the program finished successfully, it did not detect any strain heterogeneity in any of these samples due to “insufficient coverage” (see results tables attached to this review). The same behavior was previously reported by Quince et al., 2017 even on simulated mock community data:

“We were unable to run ConStrains [15] on the same data set, as the program complained that insufficient coverage of E. coli specific genes was obtained from the MetaPhlAn mapping. This is despite the fact that the E. coli coverage across our samples ranged between 37.88 and 432.00, with a median coverage of 244.00, well above the minimum of 10.0 stated to be necessary to run the ConStrains algorithm.”

Moreover, we have several theoretical arguments why tools like ConStrains are less powered to pick up the observed signal of oral-fecal strain overlap, as detailed below in the response to the original comment by reviewer #2.

Also, the amount of sequence coverage per sample (in Gbp) should be specified so that readers have a platform-independent reference point for the coverage that is necessary and sufficient for SNV analysis.

We have now added the total depth of microbial reads per sample to supplementary file 1, and the horizontal (breadth) and vertical (depth) coverage for each species in each sample as additional sheets to Supplementary file 2.

Moreover, as reviewer 3 points out the authors should further expand on their estimations of quantities and viability of bacterial cells in the lumen that can be attributed to salivary bacteria. In this regard, the paper could be strengthened if there would be some data on viability of bacterial strains in the intestinal tract as the authors might have underestimated the contribution of passively translocated DNA (belonging to living or dead bacteria) to the faecal microbiota. In this regard, a few years ago, Korem et al., (2015) published about this using shotgun metagenomic sequencing to calculate the ratio of sequencing coverage between the peak and trough providing a quantitative measure of a species' growth rate. It would be of importance to see if this bioinformatic approach would help solve this question.

We thank the editor for the constructive comment, and for suggesting the Korem et al., PTR method for growth rates estimation from metagenomic data. To our knowledge, there are currently four available tools that estimate microbial growth rates from sequencing data, two of them published very recently (Nov 2018): PTR (Korem et al., 2015), iRep (Brown et al., 2016, GRiD (Emiola and Oh, 2018) and DEMIC (Gao and Li, 2018). We evaluated all four of these tools in response to the editor’s suggestion.

The original PTR method (Korem et al) requires high quality, closed (i.e., finished) genomes to run; in practice, this requirement is met only by very few genomes even in curated reference databases, as noted by Brown et al., (2016) and Emiola and Oh, (2018). Moreover, the PTR code base has been unmaintained since 2015 and is only available pre-compiled, as noted by Brown et al.:

“As there is no open-source version of the PTR software, we re-implemented the PTR method.”

We did not manage to set up the PTR method to run on our computers. iRep (Brown et al.,), beyond re-implementing the original PTR algorithm, provides an adapted version that handles fragmented genomes spread into multiple contigs. However, iRep requires minimum average coverages of 5x per genome which is a prohibitive requirement when investigating less abundant taxa, as are orally sourced transmitted species in stool samples. We set up iRep on our computers but encountered various run time errors (even on the tutorial data provided by the authors). GRiD (Emiola and Oh) addresses the high coverage requirement of iRep and runs on contigs with as low as 0.2x coverage. We likewise set up this tool, but likewise encountered runtime errors, both on the test data provided by the authors and on real data. For both GRiD and iRep, we contacted the authors about this, but were so far unable to resolve the issues.

DEMIC (Gao and Li) traces coverage of genomes across multiple samples to infer putative oriC and ter sites (this distinguishes the tool from the other three tested). We ran DEMIC on all paired stool samples in our dataset, for all 310 tested taxa. The tool successfully predicted growth rates for 21 species, including four with strong oral-gut transmission signals and one ‘occasional’ transmitter, albeit only in a subset of samples (again, due to coverage cutoffs enforced by the algorithm). For these taxa, we detected active growth, as inferred by peak-to-trough ratios >1, with 23 out of 26 data points (88%) observed in individuals with positive oral-fecal transmission scores. The results are shown below.

This indicates that transmitted taxa grow actively in the gut at least in those samples where inferences were possible. We also added further theoretical arguments why the observed oral-fecal transmission signal cannot be explained by ‘passive’ transmission alone. As detailed in the response to reviewer #3’s original comment below, we now included additional references that free DNA, as released from dead bacterial cells, has a short half-life in the digestive tract, so that we do not expect fecal metagenomic signals to be skewed towards dead oral bacteria after passage through the GI tract.

Finally, a dedicated section on statistical analysis in the method description of the paper would be helpful and the numbers of included individuals should be cross checked within the supplemental data and tables (e.g. Figure 1).

We have now extended the subsection”Diversity, Community Composition and Statistical Analyses” to include more detailed explanations of the individual tests that were performed. Moreover, we have clarified the numbers of samples and individuals in that Figure (now promoted to main Figure 1) and across the main text (see detailed response to reviewer #3’s original comment).

Please see the full reviews below for further points:

Reviewer #1:

This is a solid paper which shows that bacterial strain exchange between the oral and gut environments is more extensive than previously thought and a normal occurrence, rather than solely a mark of dysbiosis / disease. Nonetheless, they also show that opportunistic pathogens had higher evidence of transmission. While the authors find no correlation between the β-diversity measures (as determined by metagenomic profiling) of the gut and the oral environment, they show that there is an extensive exchange of strains between the two. This is done by determining the strain profiles (SNV profiles of each strain) and showing that, based on probabilistic models, the overlap in some of these profiles is significantly higher than would be expected by random chance ("transmission scores"). The fact that species transmission scores correlate with oral relative abundance, but not to gut abundance, indicate that (as would be expected) the direction of transmission is from the oral cavity to the gut as does the fact that 'oral SNVs observed at an initial time point were significantly enriched among fecal SNVs that were newly gained over time, but generally not vice versa '. The paper is made stronger by the use of longitudinal data.

The paper is concise and well-written, providing a detailed, clear and understandable description of the methods (e.g. how the transmission scores are calculated and what the rationale is). The visualizations (including the Supplementary figures) are very expressive and informative. Supporting information files have been submitted, and code and data have been made available on a GitHub repository.

I have very few critical remarks to make, I think the paper is rigorous and well-polished:

We thank the reviewer for their encouraging remarks, and for supporting our manuscript.

1) PRJEB28422 accession number does not exist on ENA (checked on Nov 27th). Did the authors mean PRJEB22368?

The project ID given in the manuscript was indeed correct, and data for the corresponding sub-cohorts (DE-CTR and FR-CRC) had been uploaded to ENA under this accession, but the project had (erroneously) not yet been switched to ‘public’. This is now fixed (https://www.ebi.ac.uk/ena/data/view/PRJEB28422); thank you for spotting this.

2) "Transmission scores were negatively correlated with genome size (ρ Spearman=-0.6), indicating that transmitted species generally had smaller genomes than non-transmitted ones" Any idea why this is the case?

The genome size signal stood out among the tested co-variates, though it could in large part be explained by phylogeny (both genome size and transmission scores had a strong, and largely shared, phylogenetic signal). We chose to report this observation, but to abstain from speculation on this point, as true mechanistic hypotheses as to why small genomes are associated with transmission would need to be tested against functional complements, the inference of which is very noisy and incomplete from our data due to the heterogeneous coverage of taxa across oral and fecal samples (see also response to a comment by reviewer #2 below).

3) "the fecal relative abundance of Fusobacterium sp. positively correlated with higher levels of transmission)" Why would this be the case?

We agree with the reviewer that this is an intriguing finding, and we followed this up by building predictive models of oral and gut taxa that were able to classify individuals into ‘high’ and ‘low’ transmission groups with surprising accuracy (new Supplementary file 4 and Figure 3—figure supplement 1). Fusobacterium nucleatum subsp. stood out as strong transmission markers. Fn has repeatedly been hypothesized to translocate from the oral cavity to the gut, where its enrichment is strongly associated to colorectal cancer and other diseases. We have now extended the corresponding paragraph to discuss this in more detail.

Reviewer #2:

In this work, Schmidt et al., provide theoretical evidence for the transmission and colonization of oral microbial genomes in the distal gut. The results are interesting and important because they show that transmission of oral microbes to the gut occurs extensively in healthy individuals also in adulthood. Therefore, if these results are confirmed, oral-gut transmission might be an important factor to consider for the prevention and management of human diseases through the GIT microbiota.

We thank the reviewer for their constructive and informed comments on our study.

We have a few major concerns that the authors should address to support their findings:

1) How were the cut-offs for vertical and horizontal genome coverage chosen? A 5% breadth of coverage seems low for the identification of microbial strains. Would the choice of these cut-offs affect the results?

We filtered our initial species list by a combination of three criteria (>5% horizontal coverage, >0.25x average vertical coverage, >10^-6 relative abundance), with each criterion required to be met in at least 10% of samples (oral and fecal combined). In response to the reviewer’s comment, we have now included the full tables of relative abundance, horizontal and vertical coverages as part of Supplementary file 2, and the number of taxa meeting the depth and breadth criteria alone as Figure 1—figure supplement 5.

The breadth cutoff was chosen to exclude spurious taxa, as in our experience, erroneous mappings to a genome are marked by high (local) depth, but very low (<<5%) breadth; the cutoff value of 5% was chosen based on the plot below that shows the number of included taxa as a function of the breadth inclusion criterion alone. At the same time, we strove to retain an accurate community representation: by applying the above criteria, we removed 1,443 species from the original ‘raw’ set, but these corresponded to only ~1% each of classifiable oral and fecal microbial abundance. Moreover, the above criteria were only applied as an initial filter to define a set of relevant taxa. For each intra-individual and inter-individual SNV-based test, we applied additional cutoffs, as detailed in the Materials and methods section.

Indeed, Figure S4 shows that both vertical and horizontal genome coverage can affect the transmission score at least of some specific taxa. Please highlight the distribution of transmitters in this supplementary figure.

Following the reviewer’s suggestion, we have now highlighted transmitting, occasionally transmitting and non-transmitting taxa in the revised Figure 2—figure supplement 2, and indeed, we observed differences between these groups. Generally, correlations for transmitting taxa were distributed around (or close to) 0, whereas the outlying negative correlations can be ascribed to non-transmitters. This means that for non-transmitters, the “negative” signal is more pronounced the more data is available (deeper coverage, more shared observed genomic positions between the oral and fecal samples), whereas coverage does not generally correlate with transmission scores for transmitting taxa. We feel that this indeed strengthens our original argument that transmission scores (for transmitting taxa) are largely independent of technical covariates, and we thank the reviewer for their suggestion.

2) The authors base their work on the assumption that oral and gut SNVs profiles of transmitted genomes are more similar in an individual than between individuals. However, does this similarity of SNVs profiles necessarily imply transmission? Or, alternatively, could other individual-specific genetic and/or environmental factors shape a similar oral and gut microbiota in an individual instead of transmission?

We agree that while the observed intra-individual SNV overlap is strongly indicative of oral-fecal transmission, other factors could contribute to this signal as well, at least in principle. As we discuss in the text, these could include one-time GI tract-wide colonization events (e.g. during infancy or following a perturbation such as an antibiotics treatment) with subsequent independent evolution at both sites. However, in our opinion, the observed longitudinal signals provide evidence that oral-fecal transmission is an ongoing process even in (healthy) adults: SNVs observed in saliva at t0 are predictive of SNVs gained in feces over time, but not vice versa, and oral-to-fecal transmission rates significantly exceed background expectations.

3) As the authors use a low breadth of coverage and assume transmission based on similarity of SNVs profiles, we would like to ask that the authors confirm their results when using the ConStrains method to identify microbial strains (Luo et al., 2015). This method is also based on SNPs in oral and gut samples.

We fully agree with the reviewer that it would be desirable to corroborate our SNV-based results using an independent strain calling tool such as ConStrains. However, as detailed in our response to the editorial comment above, ConStrains did not detect any strain heterogeneity for any species in any of our tested samples, in line with previously reported tool behavior.

More generally, we had evaluated different strain calling methods at the outset of our study but found that several aspects of our specific research question rendered it outside the space of problems solved by existing tools. In particular, species present in both saliva and stool are usually abundant in one or the other, but almost never at both sites (generally oral species are present at lower abundance in the gut), thereby violating requirements of common tools (e.g., ConStrains requires an average coverage >10x for its inferences). We therefore did not aim to reconstruct comprehensive (genome-wide) strain haplotypes for each individual, which could be useful for some downstream analyses, but are not necessary to infer strain overlap. For this task, observed SNVs, weighted by their cohort-wide background frequencies, can serve as proxies for strain populations, as reported previously (e.g. by Li et al., 2016). In summary, while we did not reconstruct the entire strain space across our samples due to insufficient and heterogenous coverage, we inferred strain overlap between samples within species based on observed marker SNVs.

Reviewer #3:

This study entitled "Extensive Transmission of Microbes along the Gastrointestinal Tract" is an original work focusing on the transmission of bacterial strains between the oral and gut environment. The study is a robust analysis at strain-level of the oral and gut microbiota composition intra- and inter-individuals. The data are well exploited, especially regarding potential confounding factors between the different cohorts. The authors set out to identify population flow of bacteria from the oral cavity to the lumen. They defined metagenomes to an SNV-level resolution. The central thesis of this paper is that intra-individual overlap of SNVs between the oral cavity and the lumen is greater than that which would be expected from inter individual background thereby demonstrating oral taxa translocation. Within their model this event of translocation was a persistent one. The taxa identified as transmitted were phylogenetically diverse, yet some clade clustering was noted. The only noted characteristics of these taxa were reduced relative genome size and their anaerobic/ facultative aerobic nature.

This is a novel study with potentially significant ramifications for human intestinal microbial ecology.

We thank the reviewer for their encouraging and constructive comments.

My main critiques are these four points:

1) One of the pillar arguments in this paper is that the numbers of bacteria cells in the colon which show evidence of transmission, cannot be attributed to the passive translocation due to peristalsis. They argue that the amount of bacterial cells swallowed by a human per day (1.5*1012) would be depleted during passage through the upper digestive tract (stomach and duodenum) and thus would not contribute significantly to the gut microbiota. However, methodologically, the authors are investigating DNA not viable living cells. One might argue the authors have underestimated the contribution of passively translocated DNA (belonging to living or dead bacteria) to the faecal microbiota. The stomach and the mouth contain a high proportion of dead cells (perhaps only 1% of stomach bacteria cells are alive) whose DNA could translocate to the lumen. Given the estimates of an average person has 1 bowel moments a day (the cohort is older with disease so could be less), the average mass of stool to be 100 grams and the bacterial density in stool is estimated to be 0.9·1011 bacteria/g; an individual would pass 9x1012 bacterial cells a day. Presuming that an individual passes all the saliva they swallow a day in their bowel movement and there is no loss of bacterial DNA; One would detect ~1.5*1012 per stool. This would be above the 10% limit that the authors set. I think the authors should further expand on their estimations of quantities of bacterial cells in the lumen that can be attributed to salivary bacteria. The literature to which they reference in not primary work in essence and review. More solid numbers are needed when discussing the oral bacterial density and volume. Indeed, the focus of the Sender et al., 2016 paper is the colonic microbiota. I think their strong claims need stronger support.

The reviewer is right: a central argument of our paper is that ingested salivary bacteria would only be present at abundances below metagenomic detection after passage through the gastrointestinal tract. We realize that one important point regarding this was only made implicitly: the half-life of free DNA (released from dead bacterial cells) in the digestive tract is very short, due to the action of nucleases, the resorption of nucleosides and (to a much lesser extent) uptake of fragments by competent bacteria. Therefore, as only 1 in 100,000-1,000,000 bacterial cells survive passage through the stomach, we expect this reduction to be mirrored in a corresponding decrease in levels of their (intact) DNA after passage of the GI tract, although we recognize that damaged microbial cells are prevalent in feces (see e.g. the recent preprint by Perras et al., 2018). Following the reviewer’s suggestion, we have now referenced studies that quantified DNA degradation in saliva, in the stomach, and in the lower intestine.

Moreover, as per the editor’s suggestion, we have now confirmed that at least some transmitting taxa show indications of active growth as inferred by peak-to-trough ratios >1 in fecal samples (see detailed comment above).

Likewise, the results presented in the study are not enough to support the following statement in particular "Approximately one in three salivary microbial cells colonise in the gut, accounting for at least 2% of the classifiable microbial abundance in faeces".

We thank the reviewer for pointing this out. The phrasing of that sentence is now more precise, referring (more correctly) to the fraction of classifiable salivary microbial cells. The sentence now reads, “Approximately one in three classifiable salivary microbial cells colonize in the gut, accounting for at least 2% of the classifiable microbial abundance in feces.”

2) The number of individuals reported in the text does not match the cohort and dataset overview presented in Figure 1. In the abstract and the main text, the authors reported the analysis of 470 healthy and diseased individuals but based on Figure 1 all together the cohorts comprised 571 individuals including 365 intra-individual couples. Further, the authors reported in the main text they they focussed on a subset of 57 individuals for whom longitudinal data was available but based on Figure 1 only 46 individuals (including diseased individuals) presented time series. Then, for the case-control studies, the authors reported in the main text a total of 172 individuals but based on Figure 1 the cohorts CN-RA, FR-CRC and LU-T1D comprised 395 individuals including 219 intra-individual couples (healthy and diseased individuals). As a general comment, the authors should also clarify precisely in the main text whether the studied individuals are intra-individual couples (with both saliva and stool samples) or individuals with one sample type.

We realize that our terminology in that Figure (now revised main Figure 1) and the main text was not sufficiently precise. We thank the reviewer for pointing this out and apologize for the confusion caused. We counted ‘intra-individual sample couples’ as any paired sampling event for an individual at the same timepoint; for longitudinal cohorts (LU-T1D, CN-RA and DE-CTR), we therefore listed more intra-individual couples than individuals, as some subjects were sampled multiple times. For example, in DE-CTR, five subjects were sampled in one time series each, and provided both saliva and stool at each timepoint (providing 10 intra-individual couples). As detailed in the Materials and methods section, intra-individual longitudinal couples were blocked for when calculating score backgrounds, but we considered each intra-individual saliva-stool couple at each timepoint as a data point.

The previous number of 57 subjects with timeseries was indeed incorrect, as this included CN-RA individuals with fecal-only longitudinal sampling that were not included in the relevant tests. We have now replaced this number by the correct one of 46 individuals.

We have adapted Figure 1 to list both the number of sampling events and the number of subjects. Moreover, following the reviewer’s suggestion, we have now revised the main text and figure caption to clarify this point.

3) The authors profiled 310 prevalent species, which accounted for 99% of classifiable microbial abundance in both saliva and stool. However, there is no mention of the unclassifiable fraction of the reads, the proportion of classified over non-classified reads or the percentage of mapped reads.

In response to this comment, we have now included the full taxa relative abundance table (all 310 species across all tested samples) as part of Supplementary file 2. All relative abundances were scaled to the total number of reads mapping to informative marker genes, and the total classified abundance is reported.

4) The authors should acknowledge the difference between colonization of the lumen (faecal matter) versus colonization of the mucosa. Recent work by Zmora et al., 2018 on probiotics have highlighted the disparity between the colonization of the faecal matter versus the mucosa. I recognize they prepared the current submission before the Zmora paper came out. However, there are many other papers that make this point, at least with respect to faecal versus mucosa.

We thank the reviewer for raising this very relevant point. We have now added a statement to the discussion to clarify that by using feces as a readout, we may indeed underestimate the true extent of oral colonization in the gut, as bacteria from the mucosal linings throughout the GI tract may be under-represented.

[Editors' note: further revisions were requested prior to acceptance, as described below.]

The reviewers were all satisfied with your comments, but one question remains from reviewer 2, on the relative correlations between transmission scores for saliva and gut (see below).

Reviewer #2:

We thank the authors for their work while revising their manuscript. The manuscript reads well and we have only one additional comment concerning the correlations between average transmission scores and other parameters provided in the supplementary table.

We could reproduce the results (Results and Discussion section) for the correlations between transmission scores and prevalence_saliva (rho=0.6) as well as prevalence_gut (rho=0.05). However, there was a stronger correlation between average transmission score and prevalence_gut when accessing only the transmitters.

Additionally, strong correlations were observed for average transmission score and P/horizontal coverage for transmitters.

Can the authors comment?

We thank the reviewer for their diligent re-analysis of this data subset, and for their valuable comment. We completely agree with the reviewer’s finding and had indeed made the same observations during our own analysis of the data. We now realize that the phrasing in presenting these results was not sufficiently precise.

We tested correlations between transmission scores and ‘technical’ parameters in two ways: as averages of correlations per taxon (presented in Figure 2—figure supplement 2), and as correlations of averages (the results pointed out by the reviewer, Results and Discussion section). The former tests for truly ‘technical’ effects, by correlating transmission scores with co-variates across all samples within a taxon and then testing for systematic associations across all taxa. For example, for V. parvula (one of the strongest oral-fecal transmitters in our study), the correlations between transmission scores and horizontal coverage in saliva (0.08) and stool (0.12), vertical coverage in saliva (0.09) and stool (0.01), and salivary abundance (-0.08) across all samples were negligible. When viewed across all taxa (Figure 2—figure supplement 2), and following the reviewer’s previous comment, we concluded that in general, these technical parameters were not positively correlated to transmission scores (in particular for transmitting taxa).

The reviewer’s above findings point to the second type of tests – correlations of averages. For this, we aimed to compare average transmission scores for each taxon across samples to averaged co-variates (prevalence, abundance, coverage). Indeed, fecal prevalence did not globally correlate to average transmission scores (rho=0.05 as reported), but there is a trend for transmitting taxa only (rho=0.67), and similarly for horizontal and vertical fecal coverage (implying abundance). In our view, these are biological rather than technical observations: taxa that are (on average) stronger oral-fecal transmitters are (on average) more prevalent across subjects, and more abundant (covered) in the gut. However, the above tests (Figure 2—figure supplement 2) show that for each individual taxon, transmission scores across subjects are not driven by technical co-variates.

We have now adapted the phrasing in this discussion in the main text and caption of Figure 2—figure supplement 2 accordingly.

https://doi.org/10.7554/eLife.42693.037

Article and author information

Author details

  1. Thomas SB Schmidt

    Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
    Contribution
    Conceptualization, Data curation, Software, Formal analysis, Investigation, Visualization, Methodology, Writing—original draft, Writing—review and editing
    Contributed equally with
    Matthew R Hayward
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-8587-4177
  2. Matthew R Hayward

    Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
    Present address
    The Ragon Institute of MGH, MIT and Harvard, Cambridge, United States
    Contribution
    Conceptualization, Data curation, Funding acquisition, Investigation, Methodology, Writing—original draft, Writing—review and editing
    Contributed equally with
    Thomas SB Schmidt
    Competing interests
    No competing interests declared
  3. Luis P Coelho

    Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
    Present address
    Institute of Science and Technology for Brain-Inspired Intelligence (ISTBI), Fudan University, Shanghai, China
    Contribution
    Conceptualization, Software, Methodology, Writing—review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-9280-7885
  4. Simone S Li

    Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
    Present address
    Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kongens Lyngby, Denmark
    Contribution
    Conceptualization, Methodology, Writing—review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-0073-3656
  5. Paul I Costea

    Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
    Contribution
    Conceptualization, Methodology
    Competing interests
    No competing interests declared
  6. Anita Y Voigt

    Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
    Present address
    The Jackson Laboratory for Genomic Medicine, Connecticut, United States
    Contribution
    Resources, Investigation
    Competing interests
    No competing interests declared
  7. Jakob Wirbel

    Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
    Contribution
    Formal analysis, Visualization, Writing—review and editing
    Competing interests
    No competing interests declared
  8. Oleksandr M Maistrenko

    Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
    Contribution
    Resources, Investigation, Writing—review and editing
    Competing interests
    No competing interests declared
  9. Renato JC Alves

    1. Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
    2. Joint PhD programme, European Molecular Biology Laboratory and Faculty of Biosciences, Heidelberg University, Heidelberg, Germany
    Contribution
    Data curation, Investigation
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-7212-0234
  10. Emma Bergsten

    Department of Gastroenterology and EA7375 -EC2M3, APHP and UPEC Université Paris-Est Créteil, Créteil, France
    Contribution
    Investigation, Writing—review and editing
    Competing interests
    No competing interests declared
  11. Carine de Beaufort

    1. Luxembourg Centre for Systems Biomedicine, Luxembourg, Luxembourg
    2. Clinique Pédiatrique, Centre Hospitalier de Luxembourg, Luxembourg, Luxembourg
    Contribution
    Resources, Data curation
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-4310-6799
  12. Iradj Sobhani

    Department of Gastroenterology and EA7375 -EC2M3, APHP and UPEC Université Paris-Est Créteil, Créteil, France
    Contribution
    Resources, Data curation
    Competing interests
    No competing interests declared
  13. Anna Heintz-Buschart

    Luxembourg Centre for Systems Biomedicine, Luxembourg, Luxembourg
    Present address
    Department of Soil Ecology, Helmholtz Centre for Environmental Research - UFZ, Halle, Germany
    Contribution
    Resources, Data curation, Investigation, Writing—review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-9780-1933
  14. Shinichi Sunagawa

    Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
    Present address
    Department of Biology, ETH Zürich, Zürich, Switzerland
    Contribution
    Conceptualization, Supervision, Project administration, Writing—review and editing
    Competing interests
    No competing interests declared
  15. Georg Zeller

    Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
    Contribution
    Supervision, Funding acquisition, Methodology, Writing—review and editing
    Competing interests
    No competing interests declared
  16. Paul Wilmes

    Luxembourg Centre for Systems Biomedicine, Luxembourg, Luxembourg
    Contribution
    Conceptualization, Supervision, Funding acquisition, Project administration, Writing—review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-6478-2924
  17. Peer Bork

    1. Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
    2. Max Delbrück Centre for Molecular Medicine, Berlin, Germany
    3. Molecular Medicine Partnership Unit (MMPU), European Molecular Biology Laboratory and University Hospital Heidelberg, Heidelberg, Germany
    4. Department of Bioinformatics, Biocenter, University of Würzburg, Würzburg, Germany
    Contribution
    Conceptualization, Resources, Supervision, Funding acquisition, Methodology, Writing—original draft, Project administration, Writing—review and editing
    For correspondence
    bork@embl.de
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-2627-833X

Funding

Fonds National de la Recherche Luxembourg (CORE/15/BM/10404093)

  • Thomas SB Schmidt
  • Matthew R Hayward
  • Anna Heintz-Buschart

H2020 European Research Council (ERC-AdG-669830)

  • Thomas SB Schmidt
  • Simone S Li
  • Oleksandr M Maistrenko
  • Renato JC Alves
  • Peer Bork

H2020 Marie Skłodowska-Curie Actions (661019)

  • Matthew Robert Hayward

German Network for Bioinformatics Infrastructure (de.NBI #031A537B)

  • Georg Zeller
  • Peer Bork

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

The authors would like to thank Sina Klai of the University of Zürich, Switzerland, Johanna M Schmidt and Gereon Rieke of the University of Bonn, Germany, for helpful comments and discussions on this manuscript, in particular regarding the medical relevance of several of the discussed bacterial species. We thank Katri Korpela, Lucas Silva, Thea van Rossum and other members of the Bork lab at EMBL, Germany, for helpful discussions. We thank Anna M Glazek and Yan Ping Yuan for bioinformatics support, Stefanie Kandels-Lewis of the EMBL for support on sample logistics and administration, Rajna Hercog, Jan Provaznik and Vladimir Benes and, in general, the EMBL Genomics Core Facility for sequencing support, and Laura Lebrun of LCSB for support with the biomolecular extraction platform. TSBS, MRH and AHB were supported by a Luxembourg National Research Fund CORE-INTER grant (MicroCancer; CORE/15/BM/10404093). MRH was additionally supported by a Marie Curie Individual Fellowship (661019). TSBS, SSL, OMM, RJA and PB were supported by an European Research Council grant (MicroBioS; ERC-AdG-669830). GZ and PB were supported by the BMBF-funded Heidelberg Center for Human Bioinformatics (HD-HuB) within the German Network for Bioinformatics Infrastructure (de.NBI #031A537B).

Ethics

Human subjects: Informed consent was obtained from all study subjects for which novel data was generated; see respective previous publications for details (PMID: 27723761; PMID: 25432777; PMID: 25888008).

Senior Editor

  1. Wendy S Garrett, Harvard TH Chan School of Public Health, United States

Reviewing Editor

  1. Max Nieuwdorp, AMC, Netherlands

Reviewers

  1. Andrei Prodan, Amsterdam University Medical Center, Netherlands
  2. Paul O'Toole

Publication history

  1. Received: October 9, 2018
  2. Accepted: February 3, 2019
  3. Accepted Manuscript published: February 12, 2019 (version 1)
  4. Version of Record published: March 19, 2019 (version 2)

Copyright

© 2019, Schmidt et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 2,278
    Page views
  • 470
    Downloads
  • 0
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)

Further reading

    1. Computational and Systems Biology
    Nidhi Seethapathi, Manoj Srinivasan
    Research Article
    1. Computational and Systems Biology
    2. Physics of Living Systems
    Jérôme Tubiana et al.
    Tools and Resources