Whole genome comparison of a large collection of mycobacteriophages reveals a continuum of phage genetic diversity

  1. Welkin H Pope
  2. Charles A Bowman
  3. Daniel A Russell
  4. Deborah Jacobs-Sera
  5. David J Asai
  6. Steven G Cresawn
  7. William R Jacobs Jr
  8. Roger W Hendrix
  9. Jeffrey G Lawrence
  10. Graham F Hatfull  Is a corresponding author
  11. Science Education Alliance Phage Hunters Advancing Genomics and Evolutionary Science
  12. Phage Hunters Integrating Research and Education
  13. Mycobacterial Genetics Course
  1. University of Pittsburgh, United States
  2. Howard Hughes Medical Institute, United States
  3. James Madison University, United States
  4. Albert Einstein College of Medicine, United States
7 figures, 2 tables and 2 additional files

Figures

Geographical distribution of sequenced mycobacteriophages.

(A) Locations of sequenced mycobacteriophages across the globe. (B) Locations of sequenced mycobacteriophages across the United States. Colors and letter designations on the isolates refer to the …

https://doi.org/10.7554/eLife.06416.003
Figure 2 with 3 supplements
Nucleotide sequence comparison of 627 mycobacteriophages displayed as a dotplot.

Complete genome sequences of 627 mycobacteriophages were concatenated into a single file which was compared with itself using Gepard (Krumsiek et al., 2007) and displayed as a dotplot using default …

https://doi.org/10.7554/eLife.06416.004
Figure 2—source data 1

Concatenated DNA sequences for 627 phage genomes.

https://doi.org/10.7554/eLife.06416.005
Figure 2—figure supplement 1
Dotplot of phages in Clusters I, N, P and the singleton Sparky.

A dotplot was generated using a concatenated file of genome sequences using Gepard (Krumsiek et al., 2007). The complexity of the genome relationships is illustrated by the Cluster I phages which …

https://doi.org/10.7554/eLife.06416.006
Figure 2—figure supplement 2
Dotplot of Carcharodon, Che9c, Kheth, and Dori.

The dotplot of concatenated genome sequences illustrates the ambiguity of whether the singleton Dori warrants inclusion in Cluster B. Dori shares DNA sequence similarity with its closest relative …

https://doi.org/10.7554/eLife.06416.007
Figure 2—figure supplement 3
Dotplot of Corndog, Brujita, SG4, Yoshi, and MooMoo.

The dotplot of concatenated genome sequences illustrates the complex relationships between the singleton MooMoo and other phages. MooMoo shares DNA sequence similarity with SG4 (Subcluster F1) and …

https://doi.org/10.7554/eLife.06416.008
Network phylogeny of 627 mycobacteriophages based on gene content.

Genomes of 627 mycobacteriophages were compared according to shared gene content using the Phamerator (Cresawn et al., 2011) database Mykobacteriophage_627, and displayed using SplitsTree (Huson and …

https://doi.org/10.7554/eLife.06416.010
Figure 3—source data 1

Nexus file containing phamily assignments for 627 phage genomes.

https://doi.org/10.7554/eLife.06416.011
Figure 4 with 2 supplements
Proportions of orphams in mycobacteriophage genomes.

The proportions of genes that are orphams (i.e., single-gene phamilies with no homologues within the mycobacteriophage dataset) are shown for each phage. The order of the phages is as shown in Supple…

https://doi.org/10.7554/eLife.06416.012
Figure 4—source data 1

Pham table containing phamily designations for 627 phage genomes.

https://doi.org/10.7554/eLife.06416.013
Figure 4—figure supplement 1
Shared gene content between Dori, MooMoo, and other mycobacteriophages.

(A) Average percentages of phamilies shared between Dori and other mycobacteriophages. (B) Average percentages of phamilies shared between MooMoo and other mycobacteriophages. Genomes on the x axis …

https://doi.org/10.7554/eLife.06416.014
Figure 4—figure supplement 2
Shared gene content between Gaia, Sparky, and other mycobacteriophages.

(A) Average percentages of phamilies shared between Gaia and other mycobacteriophages. (B) Average percentages of phamilies shared between Sparky and other mycobacteriophages. Genomes on the x axis …

https://doi.org/10.7554/eLife.06416.015
Heat map representation of shared gene content among 627 mycobacteriophages.

The percentages of pairwise shared genes was determined using a Phamerator (Cresawn et al., 2011) database (Mykobacteriophage_627) populated with 627 completely sequenced phage genomes. The 69,574 …

https://doi.org/10.7554/eLife.06416.016
Figure 5—source data 1

Dataset showing percentages of pairwise shared phamilies.

https://doi.org/10.7554/eLife.06416.017
Figure 6 with 2 supplements
Cluster diversity and isolation.

(A) The CLuster Averaged Shared Phamilies (CLASP; blue), Cluster Associated Phamilies (CAP; red) and Cluster Cohesion Index (CCI; green) values are plotted for each mycobacteriophage cluster. (B) …

https://doi.org/10.7554/eLife.06416.018
Figure 6—source data 1

Datasets showing numbers of CLuster Average Shard Phamilies.

https://doi.org/10.7554/eLife.06416.019
Figure 6—figure supplement 1
Resampling CLASP values for cluster diversity and size.

CLuster Averaged Shared Phamilies (CLASP) values were calculated for Clusters A, B, C, E, F, and K by resampling random subsets of the genomes. The size of the subsets is shown on the x axis and …

https://doi.org/10.7554/eLife.06416.020
Figure 6—figure supplement 2
Cluster diversity shown by Cluster-Associated Phamilies (CAP) and Cluster Phamily Variation (CPV) indices.

The CAP and CPV values are plotted for each cluster.

https://doi.org/10.7554/eLife.06416.021
Rarefaction analysis of mycobacteriophage genomes.

(A) The numbers of phamilies are reported for between 1 and 627 phage genomes sampled at random without replacement; the mean of 10,000 iterations is shown in red; gray lines indicate a confidence …

https://doi.org/10.7554/eLife.06416.023
Figure 7—source data 1

Datasets for determination of rarefaction curves.

https://doi.org/10.7554/eLife.06416.024

Tables

Table 1

Diversity and genetic isolation of mycobacteriophage genome clusters

https://doi.org/10.7554/eLife.06416.009
Cluster# Subclusters# GenomesAverage # genes*Average length (bp)Total phamsTotal genesCLASPCAP§CCI#CII
A1123290 ± 5.351,514108520,88038.312.40.0880.2
B5109100.4 ± 4.568,65342110,94466.223.20.2481.0
C245231 ± 5.9155,50448610,39589.329.40.4884.6
D21089.3 ± 6.464,96514789388.164.30.6171.4
E135141.9 ± 3.475,526236496787.263.80.6059.3
F366105.3 ± 5.357,416658695054.44.90.1655.8
G11461.5 ± 1.241,8457286196.091.10.8555.6
H2598.4 ± 5.769,46920749261.631.50.4867.6
I2478 ± 3.749,95414731258.935.00.5323.8
J116239.8 ± 9.3110,332530377670.840.10.4558.5
K53295.7 ± 4.659,720411306951.820.00.2373.5
L313127.9 ± 6.575,177246166378.250.80.5272.4
M23141 ± 8.881,63620142373.563.00.7069.2
N1769.1 ± 2.242,88815248464.145.60.4540.8
O15124.2 ± 3.170,65115162190.683.30.8264.2
P2978.8 ± 2.147,66815970976.142.30.5034.0
Q1585.2 ± 3.753,7559042696.690.40.9573.3
R14101.5 ± 2.571,34811740691.484.80.8771.8
S12109 ± 2.065,17211721891.791.70.9370.9
T1366.7 ± 2.442,8338320086.182.50.8062.7
Dori119464,6139494N/AN/AN/A35.8
DS6A119760,5889697N/AN/AN/A58.3
Gaia1119490,460193194N/AN/AN/A58.0
MooMoo119855,1789898N/AN/AN/A31.6
Muddy117148,2287071N/AN/AN/A71.4
Patience1110970,506109109N/AN/AN/A57.8
Sparky119363,3349393N/AN/AN/A48.4
Wildcat1114878,296148148N/AN/AN/A69.6
  1. *

    Average number of protein-coding genes per genome, with standard deviation.

  2. Total phams is the sum of all phamilies (groups of homologous mycobacteriophage genes) in that cluster.

  3. The Cluster Averaged Shared Phamilies (CLASP) index is the average of the percentages of phamilies shared pairwise between genomes within a cluster.

  4. §

    The Cluster-Associated Phamilies (CAP) index is the percentage of the average number of phamilies per genome within a cluster whose phamilies are present in every cluster member.

  5. #

    The Cluster Cohesion Index (CCI) is generated by dividing the average number of genes per genome by the total number of phamilies (phams) in that cluster.

  6. The Cluster Isolation Index (CII) is the percentage of phams that are present only in that cluster, and not present in other mycobacteriophages.

  7. N/A: Not applicable.

Table 2

Genometrics and Cluster Cohesion Indexes of mycobacteriophages

https://doi.org/10.7554/eLife.06416.022
ClusterSubcluster# GenomesAverage # genesAverage length (bp)# PhamsCLASP*CAPCCI
A23290.051,514108538.312.48.0
A17291.251,95441672.336.922.0
A22893.452,80531264.730.130.0
A33787.750,32516381.148.854.0
A44687.451,37612592.770.670.0
A51686.050,53115281.458.757.0
A61197.851,67712890.275.176.0
A7384.352,94111574.964.473.0
A8497.851,59710793.586.891.0
A9496.052,83810692.783.491.0
A10780.049,17411281.660.971.0
A11498.552,26011393.688.387.0
B108100.468,65342166.223.224.0
B177101.868,53214493.272.971.0
B2889.967,26710194.984.689.0
B312102.868,69812196.384.785.0
B4896.170,61916679.945.858.0
B5396.370,03310891.787.289.0
C45231.0155,50448689.329.448.0
C144231.0155,29734591.973.267.0
C21229.0164,602227N/AN/AN/A
D1089.364,96514788.164.361.0
D1987.364,69710094.988.887.0
D21107.067,383107N/AN/AN/A
E35141.975,52623587.263.860.0
F66105.357,41665854.44.916.0
F160104.857,48657359.620.618.0
F25110.855,99620765.749.054.0
F31107.060,285105N/AN/AN/A
G1461.541,8457296.091.185.0
H598.469,46920761.631.548.0
H1495.869,13713181.967.973.0
H21109.070,797110N/AN/AN/A
I478.049,95414758.935.053.0
I1376.047,58810177.566.775.0
I2184.057,05084N/AN/AN/A
J16239.8110,33253070.840.145.0
K3395.759,72041151.820.023.0
K11594.359,87716685.547.957.0
K2496.356,59712885.277.775.0
K3398.261,32211192.289.588.0
K4594.057,86510693.787.289.0
K5698.262,15414482.168.268.0
L13127.975,17724678.250.852.0
L13123.774,05013592.688.892.0
L29129.375,45617090.172.276.0
L31128.076,050126N/AN/AN/A
M3141.081,63620173.563.070.0
M12135.080,59313896.696.698.0
M21153.083,724152N/AN/AN/A
N769.142,88815264.145.645.0
O5124.270,65115190.683.382.0
P978.847,66815976.142.350.0
P1878.447,31312682.952.962.0
P2182.050,51382N/AN/AN/A
Q585.253,7559096.690.495.0
R4101.571,34811791.484.887.0
S2109.065,17211791.791.793.0
T366.742,8338386.182.580.0
  1. *

    Cluster Averaged Shared Phamilies.

  2. Cluster Associated Phamilies.

  3. Cluster Cohesion Index.

Additional files

Supplementary file 1

List of 627 sequenced mycobacteriophages and cluster designations.

https://doi.org/10.7554/eLife.06416.025
Supplementary file 2

Full list of group author details.

https://doi.org/10.7554/eLife.06416.026

Download links