Whole genome comparison of a large collection of mycobacteriophages reveals a continuum of phage genetic diversity

7 figures, 2 tables and 2 additional files

Figures

Geographical distribution of sequenced mycobacteriophages.

(A) Locations of sequenced mycobacteriophages across the globe. (B) Locations of sequenced mycobacteriophages across the United States. Colors and letter designations on the isolates refer to the cluster to which the genomes belong. Data from www.phagesdb.org.

https://doi.org/10.7554/eLife.06416.003
Figure 2 with 3 supplements
Nucleotide sequence comparison of 627 mycobacteriophages displayed as a dotplot.

Complete genome sequences of 627 mycobacteriophages were concatenated into a single file which was compared with itself using Gepard (Krumsiek et al., 2007) and displayed as a dotplot using default parameters (word length, 10). The order of the genomes is as listed in Supplementary file 1. Nucleotide similarity is a primary component in assembling phages into clusters, which typically requires evident DNA similarity spanning more than 50% of the genome lengths.

https://doi.org/10.7554/eLife.06416.004
Figure 2—source data 1

Concatenated DNA sequences for 627 phage genomes.

https://doi.org/10.7554/eLife.06416.005
Figure 2—figure supplement 1
Dotplot of phages in Clusters I, N, P and the singleton Sparky.

A dotplot was generated using a concatenated file of genome sequences using Gepard (Krumsiek et al., 2007). The complexity of the genome relationships is illustrated by the Cluster I phages which share varying degrees of similarity to phages in Clusters N and P, as well as the singleton Sparky. Because inclusion of a phage in a cluster typically requires sharing a span of similarity over half of the genome lengths, these phages are not assembled into a single larger cluster.

https://doi.org/10.7554/eLife.06416.006
Figure 2—figure supplement 2
Dotplot of Carcharodon, Che9c, Kheth, and Dori.

The dotplot of concatenated genome sequences illustrates the ambiguity of whether the singleton Dori warrants inclusion in Cluster B. Dori shares DNA sequence similarity with its closest relative Kheth (Subcluster B2), but it does not span 50% of the genome lengths. Dori also shares DNA sequence similarity with Che9c (Cluster I2) and Carcharodon (Cluster N).

https://doi.org/10.7554/eLife.06416.007
Figure 2—figure supplement 3
Dotplot of Corndog, Brujita, SG4, Yoshi, and MooMoo.

The dotplot of concatenated genome sequences illustrates the complex relationships between the singleton MooMoo and other phages. MooMoo shares DNA sequence similarity with SG4 (Subcluster F1) and Yoshi (Subcluster F2), but also with Brujita (Subcluster I1). MooMoo has barely detectable DNA sequence similarity with Corndog (Cluster O), but has a similar prolate virion morphology.

https://doi.org/10.7554/eLife.06416.008
Network phylogeny of 627 mycobacteriophages based on gene content.

Genomes of 627 mycobacteriophages were compared according to shared gene content using the Phamerator (Cresawn et al., 2011) database Mykobacteriophage_627, and displayed using SplitsTree (Huson and Bryant, 2006). Colored circles indicate grouping of phages labeled according to their cluster designations generated by nucleotide sequence comparison (Figure 2); singleton genomes with no close relatives are labeled but not circled. Micrographs show morphotypes of the singleton MooMoo, the Cluster F phage Mozy, and the Cluster O phage Corndog. With the exception of DS6A, all of the phages infect Mycobacterium smegmatis mc2155.

https://doi.org/10.7554/eLife.06416.010
Figure 3—source data 1

Nexus file containing phamily assignments for 627 phage genomes.

https://doi.org/10.7554/eLife.06416.011
Figure 4 with 2 supplements
Proportions of orphams in mycobacteriophage genomes.

The proportions of genes that are orphams (i.e., single-gene phamilies with no homologues within the mycobacteriophage dataset) are shown for each phage. The order of the phages is as shown in Supplementary file 1. All of the singleton genomes have >30% orphams, and most of the other genomes with relatively high proportions of orphams are the single-genome subclusters (Table 2) including Hawkeye (D2), Myrna (C2), Squirty (F3), Barnyard (H2), Che9c (I2), Whirlwind (L3), Rey (M2), and Purky (P2). Three phages shown in red type are not singletons or single-genome subclusters but have relatively high proportions of orphams. Predator and Mendokysei are members of the diverse and small clusters (five or fewer genomes) H and T, respectively; KayaCho is a member of Subcluster B4 but has a sufficiently high proportion of orphams to arguably warrant formation of a new subcluster, B6.

https://doi.org/10.7554/eLife.06416.012
Figure 4—source data 1

Pham table containing phamily designations for 627 phage genomes.

https://doi.org/10.7554/eLife.06416.013
Figure 4—figure supplement 1
Shared gene content between Dori, MooMoo, and other mycobacteriophages.

(A) Average percentages of phamilies shared between Dori and other mycobacteriophages. (B) Average percentages of phamilies shared between MooMoo and other mycobacteriophages. Genomes on the x axis are listed in the same order as in Supplementary file 1 and the cluster designations are indicated.

https://doi.org/10.7554/eLife.06416.014
Figure 4—figure supplement 2
Shared gene content between Gaia, Sparky, and other mycobacteriophages.

(A) Average percentages of phamilies shared between Gaia and other mycobacteriophages. (B) Average percentages of phamilies shared between Sparky and other mycobacteriophages. Genomes on the x axis are listed in the same order as in Supplementary file 1 and the cluster designations are indicated.

https://doi.org/10.7554/eLife.06416.015
Heat map representation of shared gene content among 627 mycobacteriophages.

The percentages of pairwise shared genes was determined using a Phamerator (Cresawn et al., 2011) database (Mykobacteriophage_627) populated with 627 completely sequenced phage genomes. The 69,574 genes were assembled into 5205 phamilies (phams) of related sequences using kClust, and the average proportions of shared phams calculated. Genomes are ordered on both axes according to their cluster and subcluster designations (Supplementary file 1) determined by nucleotide sequence similarities (Figure 2). The values (proportions of pairwise shared phams averaged between each partner) are colored as indicated.

https://doi.org/10.7554/eLife.06416.016
Figure 5—source data 1

Dataset showing percentages of pairwise shared phamilies.

https://doi.org/10.7554/eLife.06416.017
Figure 6 with 2 supplements
Cluster diversity and isolation.

(A) The CLuster Averaged Shared Phamilies (CLASP; blue), Cluster Associated Phamilies (CAP; red) and Cluster Cohesion Index (CCI; green) values are plotted for each mycobacteriophage cluster. (B) The Cluster Isolation Index (CII) and CLASP values (both shown as percentages) are plotted for each phage cluster. Singletons (white circles) are not individually labeled but correspond to the values shown in Table 1.

https://doi.org/10.7554/eLife.06416.018
Figure 6—source data 1

Datasets showing numbers of CLuster Average Shard Phamilies.

https://doi.org/10.7554/eLife.06416.019
Figure 6—figure supplement 1
Resampling CLASP values for cluster diversity and size.

CLuster Averaged Shared Phamilies (CLASP) values were calculated for Clusters A, B, C, E, F, and K by resampling random subsets of the genomes. The size of the subsets is shown on the x axis and each point is the average of 20 iterations. The minimum and maximum variations among the iterations are shown.

https://doi.org/10.7554/eLife.06416.020
Figure 6—figure supplement 2
Cluster diversity shown by Cluster-Associated Phamilies (CAP) and Cluster Phamily Variation (CPV) indices.

The CAP and CPV values are plotted for each cluster.

https://doi.org/10.7554/eLife.06416.021
Rarefaction analysis of mycobacteriophage genomes.

(A) The numbers of phamilies are reported for between 1 and 627 phage genomes sampled at random without replacement; the mean of 10,000 iterations is shown in red; gray lines indicate a confidence interval of two standard deviations. The black line shows a hyperbolic curve fit to the data from phage counts 1 to 314. The inset shows the number of new phams encountered upon the inclusion of each phage, with the mean number for the 10,000 iterations shown in blue and the predicted value from the hyperbolic curve shown in black. (B) Rarefaction analysis of 232 Cluster A phages. The total numbers of phamilies are reported for between 1 and 232 phages sampled at random without replacement from Cluster A; the mean of 10,000 iterations is shown in red; gray lines indicate a confidence interval of two standard deviations. The black line shows a hyperbolic curve fit to the data from phage counts 1 to 117. The inset shows the number of new phams encountered upon the inclusion of each phage, with the mean number for 10,000 iterations shown in blue and the predicted value from the hyperbolic curve shown in black. (C) Rarefaction analysis of 108 Cluster B phages; the hyperbolic curve was fit to the data from phage counts 1 to 54. (D) Fits of the hyperbolic (Equation 1) and hyperbolic with linear (Equation 2) models for phamily identification within genome samples.

https://doi.org/10.7554/eLife.06416.023
Figure 7—source data 1

Datasets for determination of rarefaction curves.

https://doi.org/10.7554/eLife.06416.024

Tables

Table 1

Diversity and genetic isolation of mycobacteriophage genome clusters

https://doi.org/10.7554/eLife.06416.009
Cluster# Subclusters# GenomesAverage # genes*Average length (bp)Total phamsTotal genesCLASPCAP§CCI#CII
A1123290 ± 5.351,514108520,88038.312.40.0880.2
B5109100.4 ± 4.568,65342110,94466.223.20.2481.0
C245231 ± 5.9155,50448610,39589.329.40.4884.6
D21089.3 ± 6.464,96514789388.164.30.6171.4
E135141.9 ± 3.475,526236496787.263.80.6059.3
F366105.3 ± 5.357,416658695054.44.90.1655.8
G11461.5 ± 1.241,8457286196.091.10.8555.6
H2598.4 ± 5.769,46920749261.631.50.4867.6
I2478 ± 3.749,95414731258.935.00.5323.8
J116239.8 ± 9.3110,332530377670.840.10.4558.5
K53295.7 ± 4.659,720411306951.820.00.2373.5
L313127.9 ± 6.575,177246166378.250.80.5272.4
M23141 ± 8.881,63620142373.563.00.7069.2
N1769.1 ± 2.242,88815248464.145.60.4540.8
O15124.2 ± 3.170,65115162190.683.30.8264.2
P2978.8 ± 2.147,66815970976.142.30.5034.0
Q1585.2 ± 3.753,7559042696.690.40.9573.3
R14101.5 ± 2.571,34811740691.484.80.8771.8
S12109 ± 2.065,17211721891.791.70.9370.9
T1366.7 ± 2.442,8338320086.182.50.8062.7
Dori119464,6139494N/AN/AN/A35.8
DS6A119760,5889697N/AN/AN/A58.3
Gaia1119490,460193194N/AN/AN/A58.0
MooMoo119855,1789898N/AN/AN/A31.6
Muddy117148,2287071N/AN/AN/A71.4
Patience1110970,506109109N/AN/AN/A57.8
Sparky119363,3349393N/AN/AN/A48.4
Wildcat1114878,296148148N/AN/AN/A69.6
  1. *

    Average number of protein-coding genes per genome, with standard deviation.

  2. Total phams is the sum of all phamilies (groups of homologous mycobacteriophage genes) in that cluster.

  3. The Cluster Averaged Shared Phamilies (CLASP) index is the average of the percentages of phamilies shared pairwise between genomes within a cluster.

  4. §

    The Cluster-Associated Phamilies (CAP) index is the percentage of the average number of phamilies per genome within a cluster whose phamilies are present in every cluster member.

  5. #

    The Cluster Cohesion Index (CCI) is generated by dividing the average number of genes per genome by the total number of phamilies (phams) in that cluster.

  6. The Cluster Isolation Index (CII) is the percentage of phams that are present only in that cluster, and not present in other mycobacteriophages.

  7. N/A: Not applicable.

Table 2

Genometrics and Cluster Cohesion Indexes of mycobacteriophages

https://doi.org/10.7554/eLife.06416.022
ClusterSubcluster# GenomesAverage # genesAverage length (bp)# PhamsCLASP*CAPCCI
A23290.051,514108538.312.48.0
A17291.251,95441672.336.922.0
A22893.452,80531264.730.130.0
A33787.750,32516381.148.854.0
A44687.451,37612592.770.670.0
A51686.050,53115281.458.757.0
A61197.851,67712890.275.176.0
A7384.352,94111574.964.473.0
A8497.851,59710793.586.891.0
A9496.052,83810692.783.491.0
A10780.049,17411281.660.971.0
A11498.552,26011393.688.387.0
B108100.468,65342166.223.224.0
B177101.868,53214493.272.971.0
B2889.967,26710194.984.689.0
B312102.868,69812196.384.785.0
B4896.170,61916679.945.858.0
B5396.370,03310891.787.289.0
C45231.0155,50448689.329.448.0
C144231.0155,29734591.973.267.0
C21229.0164,602227N/AN/AN/A
D1089.364,96514788.164.361.0
D1987.364,69710094.988.887.0
D21107.067,383107N/AN/AN/A
E35141.975,52623587.263.860.0
F66105.357,41665854.44.916.0
F160104.857,48657359.620.618.0
F25110.855,99620765.749.054.0
F31107.060,285105N/AN/AN/A
G1461.541,8457296.091.185.0
H598.469,46920761.631.548.0
H1495.869,13713181.967.973.0
H21109.070,797110N/AN/AN/A
I478.049,95414758.935.053.0
I1376.047,58810177.566.775.0
I2184.057,05084N/AN/AN/A
J16239.8110,33253070.840.145.0
K3395.759,72041151.820.023.0
K11594.359,87716685.547.957.0
K2496.356,59712885.277.775.0
K3398.261,32211192.289.588.0
K4594.057,86510693.787.289.0
K5698.262,15414482.168.268.0
L13127.975,17724678.250.852.0
L13123.774,05013592.688.892.0
L29129.375,45617090.172.276.0
L31128.076,050126N/AN/AN/A
M3141.081,63620173.563.070.0
M12135.080,59313896.696.698.0
M21153.083,724152N/AN/AN/A
N769.142,88815264.145.645.0
O5124.270,65115190.683.382.0
P978.847,66815976.142.350.0
P1878.447,31312682.952.962.0
P2182.050,51382N/AN/AN/A
Q585.253,7559096.690.495.0
R4101.571,34811791.484.887.0
S2109.065,17211791.791.793.0
T366.742,8338386.182.580.0
  1. *

    Cluster Averaged Shared Phamilies.

  2. Cluster Associated Phamilies.

  3. Cluster Cohesion Index.

Additional files

Supplementary file 1

List of 627 sequenced mycobacteriophages and cluster designations.

https://doi.org/10.7554/eLife.06416.025
Supplementary file 2

Full list of group author details.

https://doi.org/10.7554/eLife.06416.026

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Welkin H Pope
  2. Charles A Bowman
  3. Daniel A Russell
  4. Deborah Jacobs-Sera
  5. David J Asai
  6. Steven G Cresawn
  7. William R Jacobs Jr
  8. Roger W Hendrix
  9. Jeffrey G Lawrence
  10. Graham F Hatfull
  11. Science Education Alliance Phage Hunters Advancing Genomics and Evolutionary Science
  12. Phage Hunters Integrating Research and Education
  13. Mycobacterial Genetics Course
(2015)
Whole genome comparison of a large collection of mycobacteriophages reveals a continuum of phage genetic diversity
eLife 4:e06416.
https://doi.org/10.7554/eLife.06416