Metagenomic meta-analysis of the gut microbiome associated with CDI

(A) On top, fecal metagenomes used in the meta-analysis from 10 public CDI or diarrhoeal study populations, divided by groups: subjects diagnosed with CDI (CDI, n=234), subjects diagnosed with other diseases than CDI (D-Ctr, n=114), and healthy subjects (H-Ctr, n=186) samples. See Methods for further group description. At the bottom, fraction of C. difficile positive samples per group. C. difficile toxin genes were found only among CDI C. difficile positive samples. (B) Rarefied species richness across the three groups. (C) Prevalence, relative abundance of species that induce antibiotic-associated diarrhoea (AAD) among CDI and controls. (D) Prevalence of single and multiple toxigenic species found among C. difficile positive and C. difficile negative samples, divided by sample group. Mean comparison p-values calculated using Wilcoxon-test, * for P ≤ 0.05, ** for P ≤ 0.01, *** for P ≤ 0.001, **** for P ≤ 0.0001, empty when non significant.

A CDI-specific microbial signature

(A) Microbial species signature associated with CDI, healthy controls and diseased controls, as seen by LASSO model and (B) its associated AUC values for all samples as well as for single study populations, when comparing CDI to healthy and diseased controls combined (left) and CDI with healthy controls only (right). In the latter, the AUC value of Vincent_2016 is left intentionally blank as only diseased controls (D-Ctr) are available for this study population (see Supplementary Table 1).

C. difficile tracking in a global dataset collection

(A) Overview of the global dataset collection composed of 42,900 samples, from 253 publicly available studies, including host-associated and environmental samples (internal ring), and their subcategories (external ring). The human gut was the most represented environment (66.3%, n=28,423) in our collection. (B) Stratification of the human gut samples, divided by health status and age group (see Methods for detailed age group description, numbers refer to the number of samples prior to read filtering). (C) Prevalence of C. difficile positive samples per category. Total values refer to the number of samples after the initial read filtering and before time series dereplication (see Methods). From this point onwards, results refer to this extended collection of 42,900 samples.

C. difficile prevalence and associated microbial diversity

(A) C. difficile prevalence in stool samples in healthy and diseased human (left) and animal (right) hosts over lifetime. C. difficile prevalence divided by (B) health status and delivery mode, (C) health status and prematurity and (D) feeding mode within the first semester of life. For animal samples, only species with at least ten total samples are shown. (E) Species richness of human stool samples, with and without C. difficile, across age groups in healthy (left) and diseased subjects (right). See Methods for details on disease and age group definitions. For samples belonging to time series, only the representative samples are included in the estimations (see Methods). Mean comparison p-values calculated using t-test, * for P ≤ 0.05, ** for P ≤ 0.01, *** for P ≤ 0.001, **** for P ≤ 0.0001, empty when non significant.

Biotic context associated with C. difficile

Species associated with C. difficile, divided by age group and health status: significant positive associations are shown in dark orange, significant negative ones in dark blue (p-values adjusted using BH). Lighter shades indicate non significant associations. Only species significantly associated with C. difficile in at least one age group are shown. Number of samples per each category shown in the lower part. See Methods for details on age group definitions. Species group separation was not affected by presence of C. difficile toxin genes (data not shown). On the right, per species annotation of their i) oxygen requirement (see also Supplementary Figure 12 for the differential enrichment across the two groups) and ii) trend over lifetime. Positive Spearman’s rho values indicate species more commonly found in the gut microbiome of healthy adults, negative values the opposite trend.