MetaHiC phage-bacteria infection network reveals active cycling phages of the healthy human gut

  1. Martial Marbouty  Is a corresponding author
  2. Agnès Thierry
  3. Gaël A Millot
  4. Romain Koszul  Is a corresponding author
  1. Institut Pasteur, Unité Régulation Spatiale des Génomes, CNRS, UMR 3525, France
  2. Institut Pasteur, Bioinformatics and Biostatistics Hub, CNRS, USR 3756, France
5 figures, 1 table and 7 additional files

Figures

Figure 1 with 3 supplements
Binning of human gut microbiota using 3C and HiC protocols I and II.

(a) Sex, ID, and age of the 10 individuals investigated in this study. The colors indicate the proximity ligation protocol used to generate the libraries. (b) Computational pipeline of paired-end (PE) sequence analysis (MAG: metagenomic assembled genome). (c) Left diagram: proportion for each individual assembly of the different bins obtained. Colors represent groups of bins of similar quality (completion/contamination), according to the color code on the right. ND = not determined. Right table: for each individual number of MAGs corresponding to the left diagram.

Figure 1—figure supplement 1
Comparisons of 3C and Hi-C protocols I and II.

(a) Bar plot of 3D signal ratio calculated for each constructed library. The 3D ratio is defined as the ratio of the paired-end (PE) reads that link unambiguously two contigs together compared to the number of mapped PE reads (see 'Materials and methods'). Percentage values are shown in Supplementary file 2. (b) Pie charts indicating the proportion of the different PE reads types (Cournac et al., 2012) for libraries #16016 meta3C/metaHiC protocol I/metaHiC protocol II, #8015 meta3C/metaHiC protocol II, and #9010 meta3C/metaHiC protocol II. Red: uncuts, Blue: loops (self-circularization), Green: intra-contigs interaction, Purple: inter-contigs interaction, Yellow: weird (PE reads that interact with the restriction fragment). (c) Left: bar charts show the proportion of the different bins obtained using meta3C, metaHiC protocol I, or metaHiC protocol II on the samples #16016, #8015, and #9010. Colors indicates the quality of the bins as indicated in the color scale legend. Right: number of metagenomic assembled genomes (MAGs) in each category. The number of bins exhibiting less than 10% of contamination was strongly increased for the metaHiC protocol II.

Figure 1—figure supplement 2
Relation between abundance and completion/contamination for the retrieved metagenomic assembled genomes (MAGs).

(a) Plot of the completion (Y-axis) in function of abundance (X-axis; log2) for the 1100 retrieved MAGs (i.e. bins >500 kb). (b) Plot of the contamination (Y-axis) in function of abundance (X-axis; log2) for the same MAGs.

Figure 1—figure supplement 3
Comparisons of metagenomic assembled genomes (MAGs) obtained using meta3C, metaHiC protocol I, or metaHiC protocol II.

(a) Venn diagram of the MAGs obtained using meta3C, metaHiC protocol I, or metaHiC protocol II data for the samples #8015, #9010, and #16016. We compared the different retrieved MAGs using the different datasets and identified MAGs exhibiting more than 80% of identity. This result clearly shows that the three approaches are highly concordant. (b) LAST alignment of the MAGs > 1 Mb retrieved using meta3C or metaHiC protocol II on sample #9010. X- and Y-axis indicate the MAGs index for meta3C (X) or metaHiC (Y).

Figure 2 with 2 supplements
Metagenomic assembled genomes (MAGs) recovered using meta3C/metaHiC applied on 10 healthy human gut samples.

Phylogenetic tree comprising the 715 reconstructed MAGs with a completion above 50% and a contamination below 10% (691 different species at a threshold of 95% identity). A very long branch was cut out (symbol / /) to resize the tree for clarity. Red dots indicate MAGs with no reference in the GTDB-tk database (threshold of homology = 95%). The colors of the first three layers indicate the taxonomy of the MAGs, as determined by CheckM (phylum, class, order from center to periphery). Green and red bars represent completion and contamination levels, respectively. Completion scale: max = 100%; min = 50%. Contamination scale: max = 10%; min = 0%. Gray bars: MAGs (bin) sizes (scale: max = 7.07 Mb, min = 766 kb). Black bars in the outmost layer indicate MAG abundance in the different samples (max = 4.73%; min = 0.0039%).

Figure 2—figure supplement 1
Phylogenetic tree for the 10 processed samples.

Only metagenomic assembled genomes (MAGs) with a completion above 50% and a contamination below 10% are integrated in each tree. Colors of the inner six layers indicate the taxonomy of the MAG attributed by CheckM (from center to periphery: phylum, class, order, family, genus, species). Green and red bars in the following layers indicated completion and contamination. Black bars in the first outer layer indicated MAG size. Black bars in the second outer layer indicated MAG abundance.

Figure 2—figure supplement 2
Comparisons of metagenomic assembled genomes (MAGs) taxonomic abundance and reads taxonomic abundance.

Bar charts of MAGs taxonomic abundance (left) and reads taxonomic abundance using Kaiju (Menzel et al., 2016) (right) for each sample. Different taxonomic levels are shown (upper: phylum; middle: class; bottom: order).

Figure 3 with 2 supplements
Phage-bacteria network of interactions in human gut.

(a) Pie chart of phages contigs distribution among the different classes (see 'Materials and methods' and Figure 3—figure supplement 1). (b) Gaussian smoothed heatmap of the sequence distance between all phages taken two by two (Y-axis) and the sequence distance between their associated metagenomic assembled genomes (MAGs) (X-axis). Distance varies between 0 (full sequence similarity) and 0.3 (no similarity). Color scale on the left indicates the counting. Dotted lines indicate mean value of distances between MAGs at different taxonomic levels (species = 0.0402, genus = 0.205, family = 0.263). (c) Phylogenetic tree of phage contigs (n = 2,726). Colored strips in the first layer indicate the taxonomy of the associated MAG at the order level (1,535 over the 2,726 contigs present in the tree are attributed to a MAG). Bar plot in the second layer indicates phages contigs size (min = 5 kb; max = 211 kb).

Figure 3—figure supplement 1
Classification of phages contigs.

Scheme of the followed process to classify phage contigs in the three classes (A, B, C), as described in the 'Materials and methods' section. Red stars point to the 10 recursive Louvain iterations used to calculate the association scores.

Figure 3—figure supplement 2
Comparison of phages and their characterized hosts between samples.

(a) Circos representation of the 454 pairs of phages belonging to the same genus. The different circles represent the different characterized hosts at different taxonomic levels (from inside to outside: phylum, class, order, family, genus). Links are colored in function of agreement between hosts taxonomy: red – same taxonomic annotations at the genus level; orange – same taxonomic annotations at the class level; Gray – different annotations at the phylum level. Legend for taxonomic annotations is indicated under the circos. (b) Circos representation of three homologous metagenomic assembled genomes (MAGs) (90% identity) from samples #7020, #16026, and #8016. MAGs are indicated in gray with their respective size indicated in kilobases. Phages contigs are indicated in green. Red links represent blast hits with an identity above 90%. Green links represent blast hits with an identity above 90% for identified phages contigs.

Figure 4 with 1 supplement
Phages-host ratio in human intestinal tract.

(a) Boxplot of the coverage ratio between different classes of contigs and the mean coverage of their associated MAGs. From left to right, coverage ratio of: (1) all binned contigs into metagenomic assembled genomes (MAGs) (n = 861,082) vs. mean MAG, (2) class A phage contigs (n = 6,763) vs. mean MAG, (3) dnaA contigs (n = 856) vs. mean MAG coverage, (4) all binned contigs into MAGs (n = 672,156) vs. dnaA contig of their associated MAGs, (5) class A phage contigs (n = 6,239) vs. dnaA contig of their associated MAGs. Log2 ratio thresholds define colored areas reflecting the potential phage status: green, dormant (cat1: −1 < log2(ratio) <1); gray, undefined (cat2: log2(ratio) < −1); light red, potentially active (cat3: 1 < log2(ratio) <2); red, likely active (cat4: log2(ratio) >2). Inside bar of the box, median; box, quartiles; whiskers, 1.5× interquartile range. (b) Read coverage of phage contigs (n = 6,239) as a function of the read coverage of the dnaA contig of their associated MAG (n = 856). Dashed lines indicate the limit of the categories described in a. (c) Scatterplot of class A phage contigs coverage vs. dnaA contig coverage as a function of the growth rate index calculated using GRID for each MAG. (d) Scatterplot of class A phage contigs coverage vs. dnaA contig coverage as a function of the MAGs abundance. (e). Bar plot of MAGs taxonomic abundance depending on the categories specified in (a).

Figure 4—figure supplement 1
Phages-hosts ratio.

(a) Boxplot of the coverage ratio between phage contigs and their associated metagenomic assembled genome (MAG) (log2 scale) for the 10 samples. Dashed red lines indicate a ratio of 1. (b) Left: plot of the coverage ratio between phages contigs coverage and mean MAGs coverage (Y-axis; log2 scale) in function of the GC% of the phages contigs (X-axis). Right: plot of the coverage ratio between phages contigs coverage and mean MAGs coverage (Y-axis) in function of the ratio of the GC% of the phages contigs and the mean GC% of its associated MAG (X-axis).

Figure 5 with 1 supplement
CrAss-like phages and their associated hosts.

Phylogenetic tree of crAss-like phages contigs found in the 10 assemblies. Representatives of the 10 genera of crAss-like phages described by Guerin et al. are included (in italic blue). Names of the different contigs are indicated on the left of each branch. Vertical colored stripes indicate, from left to right: host metagenomic assembled genome (MAG) order, host MAG family, host MAG genus, host MAG species, ratio of the read coverage of phages vs. read coverage of MAG (red – ratio > 4; light red – 4 > ratio > 2; green – 2 > ratio > 0.5; gray – ratio < 0.5). Red dot: circular signal characterized by VirSorter. Blue and gray dot: strong and weak circular signal characterized by 3C, respectively. Black lines: size of the contigs (min = 58,125,125 pb; max = 171,503 bp). Genera of the crAss-like phages are indicated on the right of the tree.

Figure 5—figure supplement 1
CrAss-like phages contact maps.

Raw contact matrices of the different crAss-like phage contigs found in the 10 assemblies. Each contact map is displayed using 5 kb bins (one pixel = 5 kb). Arrow in the upper right corner indicates a 3C circular signal (red = strong signal, gray = weak signal, no arrow = no signal). Contig ID and size are indicated upper and under each matrix. Star in the contig ID indicates that VirSorter characterize this contigs as circular.

Tables

Key resources table
Reagent type (species) or resourceDesignationSource or referenceIdentifiersAdditional information
Biological sample (human)Frozen human fecal samplesInstitut Pasteur Biobanque (ICAReB)Patient 17006–19 yo – femaleFreshly frozen from healthy donor
Biological sample (human)Frozen human fecal samplesInstitut Pasteur Biobanque (ICAReB)Patient 16026–20 yo – maleFreshly frozen from healthy donor
Biological sample (human)Frozen human fecal samplesInstitut Pasteur Biobanque (ICAReB)Patient 16021–28 yo – femaleFreshly frozen from healthy donor
Biological sample (human)Frozen human fecal samplesInstitut Pasteur Biobanque (ICAReB)Patient 7016–64 yo – maleFreshly frozen from healthy donor
Biological sample (human)Frozen human fecal samplesInstitut Pasteur Biobanque (ICAReB)Patient 10015–26 yo – femaleFreshly frozen from healthy donor
Biological sample (human)Frozen human fecal samplesInstitut Pasteur Biobanque (ICAReB)Patient 8016–69 femaleFreshly frozen from healthy donor
Biological sample (human)Frozen human fecal samplesInstitut Pasteur Biobanque (ICAReB)Patient 7020–73 yo – maleFreshly frozen from healthy donor
Biological sample (human)Frozen human fecal samplesInstitut Pasteur Biobanque (ICAReB)Patient 16016–35 yo – femaleFreshly frozen from healthy donor
Biological sample (human)Frozen human fecal samplesInstitut Pasteur Biobanque (ICAReB)Patient 8015–64 yo – femaleFreshly frozen from healthy donor
Biological sample (human)Frozen human fecal samplesInstitut Pasteur Biobanque (ICAReB)Patient 9010–60 yo - femaleFreshly frozen from healthy donor
Chemical compound, drugFormaldehyde (35–37%)Sigma AldrichF8775(also contain methanol as stabilizer – 15%)
Chemical compound, drugMyOne streptavidin beadsLife Science
Chemical compound, drugdCTP-14 BiotinLife Science
Software, algorithmCutadaptMartin, 2011v.1.9.1
Software, algorithmFastQCAndrews, 2010v.0.10.1
Software, algorithmMEGAHITLi et al., 2015v.1.1.1.2
Software, algorithmQuastGurevich et al., 2013v.2.2
Software, algorithmbowtie2Langmead and Salzberg, 2012v.2.2.3
Software, algorithmMetaTOR PipelineBaudry et al., 2019
Software, algorithmMetaBatKang et al., 2015
Software, algorithmCheckMParks et al., 2015v1.1.2
Software, algorithmGTDB-TkChaumeil et al., 2020release 0.95
Software, algorithmseqtkseqt, 2020; https://github.com/lh3/seqtk
Software, algorithmhicstuffMatthey-Doret, 2020; https://github.com/koszullab/hicstuff
Software, algorithmLASThttp://last.cbrc.jp/
Software, algorithmitol https://itol.embl.de/
Software, algorithmVirSorterRoux et al., 2015v.1.0.3
Software, algorithmVIBRANTKieft et al., 2020v.1.0.1
Software, algorithmMashOndov et al., 2016v.2.0
Software, algorithmPILER-CREdgar, 2007
Software, algorithmProdigalHyatt et al., 2010v.2.6.3
Software, algorithmMUSCLEEdgar, 2004
Software, algorithmAMASBorowiec, 2016
Software, algorithmIQ-TREENguyen et al., 2015v.1.5.5
Software, algorithmR environmentR Development Core Team, 2020
OtherCovaris S220CovarisAFA tubes
OtherPrecellys TUBEBertin TechnologyVK05 + VK01 glass beads

Additional files

Supplementary file 1

Assembly statistics.

Table indicating different metrics of the 10 sample and the resulting libraries and assemblies.

https://cdn.elifesciences.org/articles/60608/elife-60608-supp1-v2.xlsx
Supplementary file 2

Mapping statistics.

Table indicating different statistics on PE reads mapping and 3D ratio.

https://cdn.elifesciences.org/articles/60608/elife-60608-supp2-v2.xlsx
Supplementary file 3

CrAss-like phages contigs.

Table containing informations on the different detected crAss-like phage contigs.

https://cdn.elifesciences.org/articles/60608/elife-60608-supp3-v2.xlsx
Supplementary file 4

Metagenomic assembled genomes (MAGs) data.

Comma separated file describing the MAGs called in the study: sample, bin ID, bin size, bin mean GC%, bin mean coverage, taxonomy (seven levels), completion, contamination, contigs number, N50, mean contig size, longest contig, coding density.

https://cdn.elifesciences.org/articles/60608/elife-60608-supp4-v2.zip
Supplementary file 5

Contigs data.

Comma separated file describing all the binned contigs present in MAGs: sample, contig ID, contig size, contig coverage, contig GC%, associated bin.

https://cdn.elifesciences.org/articles/60608/elife-60608-supp5-v2.zip
Supplementary file 6

Phages contigs data.

Comma separated file describing all the phages’ contigs associated to MAGs: sample, contig ID, associated bin.

https://cdn.elifesciences.org/articles/60608/elife-60608-supp6-v2.zip
Transparent reporting form
https://cdn.elifesciences.org/articles/60608/elife-60608-transrepform-v2.pdf

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Martial Marbouty
  2. Agnès Thierry
  3. Gaël A Millot
  4. Romain Koszul
(2021)
MetaHiC phage-bacteria infection network reveals active cycling phages of the healthy human gut
eLife 10:e60608.
https://doi.org/10.7554/eLife.60608