Top left: For each split (i.e. branch) in the core tree, the color indicates what fraction of the phylogenies of 3 kb blocks support that bi-partition of the strains. Top right: Cumulative …
List of database accessions of the genomes used.
Maximum likelihood tree reconstructed from the core genome alignments of the SC1 strains (red font names), the K-12 lab strain (green font name), and 189 E. coli reference strains (black font …
Each 3 kb alignment block rejects the core tree topology as well as the topologies of the phylogenies reconstructed from all other blocks. Left panel: For each 3 kb block in the core alignment, we …
Fractions of positions that are clonally inherited (not affected by recombination) along each branch of the clonal phylogeny for the simulated datasets. Each panel shows the clonal phylogeny with …
(A-C) SNP densities (SNPs per kilobase) along the core genome for three pairs of strains at overall nucleotide divergences of (D6–F2), 0.002 (C10–D7), and 0.0048 (D6–H10). (D-F) Corresponding …
Statistics of the pairwise analysis on the simulated data. Each row of panels corresponds to simulations performed with a different recombination rate (indicated in each row) and shows, in blue, the …
The pairwise analysis accurately estimates both the clonally inherited fractions and the sizes of the recombined segments. Left panels: Comparison of the true fraction of the genome that was …
A segment of a multiple alignment of 6 strains containing three bi-allelic SNPs, X, Y, and Z. Assuming that each SNP corresponds to a single substitution in the evolutionary history of the position, …
Comparison of the observed frequencies of columns with 1, 2, 3, and 4 different nucleotides under the simple model described in the methods. Different colored dots correspond to different subsets of …
Left panel: Fraction of supporting versus clashing SNPs for each branch of the core tree. Right panel: Cumulative distribution of the fraction of supporting SNPs across all branches. The purple and …
Comparison of pairwise distances and number of SNPs on each branch as predicted by the core tree, with pairwise distances and SNP numbers observed in the data. Left panel: Scatter plot of the …
Cumulative distribution of the fraction of supporting SNPs across all branches of the core tree for the original alignment of E. coli strains (blue) as well as for alignments from which 5% (orange) …
Cumulative distributions of the fraction of supporting SNPs across all branches of the core tree for the alignments resulting from the simulations with (dark blue), (light blue), (light …
Supporting versus clashing SNPs for trees that were built bottom-up, while minimizing SNP clashes. Left panel: Illustration of the iterative bottom-up tree reconstruction. At each step, the pair of …
Quartets of roughly equidistant strains have no consensus phylogeny. Left: Using the distribution of pairwise distances (top panel) we select, for each pairwise distance D, quartets of strains whose …
(A) Linkage disequilibrium (squared correlation, see Materials and methods) as a function of the separation of a pair of columns in the core genome alignment. (B) Probability distribution of the …
Pairwise SNP compatibility as a function of genomic distance. The plot show the fraction of SNP pairs that are compatible with a common phylogeny as a function of the genomic distance between the …
Probability distributions of the length of tree-compatible segments along the alignments of the E. coli genomes (black line) and the alignments of the sequences from the simulations with (dark …
For strain numbers ranging from to , we collected random subsets of n strains and calculated the ratios of phylogeny changes to SNPs in the alignment. The figure shows box-whisker plots that …
Ratio of the minimal number of phylogeny changes C to substitutions M for random subsets of strains using the alignment for which 5% of potentially homoplasic positions have been removed (orange) …
Observed ratio of the minimal number of phylogeny changes C and SNPs M in the alignment (vertical axis) as a function of the ratio of recombination and mutation rate used in the simulation …
Histograms of the number of times each position in the genome was overwritten by recombination along the branches of the clonal phylogeny, for the simulations with recombination-to-mutation ratios …
Comparison of the true average number of times that positions were overwritten by recombination along the branches of the clonal phylogeny (horizontal axis) versus the estimated number of times …
(A) The cumulative distribution of pairwise divergences is shown as a different colored line for each species (see legend in panel B). Both axes are shown on logarithmic scales. The vertical lines …
List of database accessions of the genomes used.
The colors indicate what fraction of the time each split in the core tree occurred in trees build from random subsets of half of the genomic loci. The colors on the leaves indicate the annotated …
Differences between the core tree T and the tree reconstructed from the alignment from which all SNPs that fall on branches of the core tree have been removed. Each branch of the core tree T is …
(A) Frequencies of 2-SNPs of the type in which a SNP is shared between strain A1 and one other strain s. Each edge corresponds to a 2-SNP and the thickness of the edge is proportional to the …
Distributions of n-SNP frequencies (left panels) and exponents of the power-law fits (right panels) for original E. coli core genome alignment (top row), the 5% homoplasy-corrected core genome …
Left panel: Entropy profiles (in bits) for six example strains, indicated in the legend. Right panel: Entropy profiles for all E. coli strains.
Entropy profiles of the n-SNP distributions for each of the E. coli phylogroups. Each panel shows the entropy of the n-SNP distribution (vertical axis) as a function of n for each strain s …
Statistical significance of the difference in n-SNP statistics for all pairs of strains. Left panel: Cumulative distribution of the p-values of the Fisher exact test (Materials and methods) for the n…
Entropy profiles of the n-SNP distributions for the E. coli data (top left panel) and the data from the simulations other panels, with the recombination rate indicated in the title of each panel. …
Error bars correspond to 95% posterior probability intervals. Right panel: Mean entropy of the entropy profiles , averaged over all strains s, as a function of the number n of strains sharing the …
Power-law fits of the n-SNP distributions for all six species. Each panel shows the reverse cumulative distributions of the frequencies of all observed 2-SNPs (blue dots), 3-SNPs (orange dots), …
Entropy profiles of all the strains for each of the six species. Each panel corresponds to one species (indicated at the top) and shows the entropy profiles of the distributions of n-SNPs in …
n-SNPs distributions and entropy profiles for the human data. Left panel: Reverse cumulative distributions of the frequencies of all observed 2-SNPs (blue dots), 3-SNPs (orange dots), 4-SNPs (green …
Each point corresponds to a pair of strains.
Total number of occurrences of n-SNPs, that is SNPs shared by n strains (vertical axes) as a function of n (horizontal axes) for the E. coli data (top left panel) and all simulated data with …
Left panel: Number of n-SNP types (vertical axis) as a function of n (horizontal axis) for the E. coli data (black line) and for simulations with different recombination to mutation rates …
Each panel corresponds to the observed n-SNP distributions with the value of n indicated at the top of each panel. All axes are shown on logarithmic scales.
The bars show the fitted exponent plus and minus one standard-deviation of the posterior distribution.
Right panels: Corresponding histograms for the number of SNPs per kilobase (dots) together with fits of the mixture model. Note the vertical axis is on a logarithmic scale.
Left: Direct visualization of the positions of each of the 2-SNP patterns along the core genome alignment. Each dashed line corresponds to an SNP and SNPs are colored according to the 2-SNP type …
Note that the vertical axis corresponds to the number of segments with the corresponding number of consecutive SNPs. Middle: Histogram of the length of segments without phylogeny breaks. Right: …
Column set | fh | |
---|---|---|
All columns | 0.118 | 0.026 |
Synonymous positions | 0.287 | 0.063 |
Second positions in codons | 0.0258 | 0.006 |
Synom. pos. without outgroup | 0.149 | 0.033 |
Sec. pos. without outgroup | 0.0172 | 0.004 |
First, for each set of positions the table lists the total number of positions, and the number of positions at which 1, 2, 3 or 4 different nucleotides appear. Second, for the subset of positions …
Statistic | All columns | Synom. codon pos. | Sec. codon pos. |
---|---|---|---|
Total columns | 2,880,516 | 349,311 | 960,172 |
1-letter columns | 2,484,831 | 299,536 | 936,588 |
2-letter columns | 363,164 | 46,807 | 22,505 |
3-letter columns | 30,611 | 2838 | 1029 |
4-letter columns | 1910 | 130 | 50 |
Transitions | 275,134 | 36,866 | 13,420 |
Transversions | 88,030 | 9941 | 9085 |
A ↔ G | 138,728 | 21,767 | 6676 |
C ↔ T | 136,406 | 15,099 | 6744 |
G ↔ T | 23,679 | 2198 | 963 |
A ↔ C | 23,636 | 3294 | 3510 |
A ↔ T | 21,036 | 2416 | 2549 |
C ↔ G | 19,679 | 2033 | 2063 |
Statistic | Average nb. of SNPs | Average nb. of nucleotides | ||||
---|---|---|---|---|---|---|
Perc. homoplasies removed | 0% | 5% | 10% | 0% | 5% | 10% |
E. coli | 7.59 | 8.79 | 10.75 | 123 | 145 | 179 |
32 | 2757 | 11606 | 387 | 30435 | 108413 | |
32 | 430 | 702 | 371 | 5020 | 8396 | |
28 | 57 | 85 | 365 | 733 | 1158 | |
14 | 16 | 18 | 211 | 256 | 312 | |
8 | 9 | 10 | 135 | 157 | 183 | |
5 | 5 | 5 | 88 | 100 | 114 | |
3 | 3 | 3 | 57 | 62 | 68 |
Starting from the full 5% homoplasy-corrected core genome alignment we extracted, for each phylogroup, the sub-alignment of all strains belonging to that phylogroup and determined the number of …
Phylogroup | No. of strains | SNP rate | Phyl. changes C | ||
---|---|---|---|---|---|
A | 6 | 0.0024 | 178 | 0.027 | 0.65 |
B1 | 35 | 0.013 | 4540 | 0.130 | 16.5 |
B2 | 6 | 0.011 | 2664 | 0.088 | 9.7 |
D | 29 | 0.017 | 5426 | 0.114 | 19.7 |
E | 1 | - | - | - | - |
F | 3 | 0.007 | - | - | - |
O | 9 | 0.003 | 2 | 0.0002 | 0.007 |
For each species, the number of strains, the median genome size, the size of the core genome alignment, and the total number of informative SNPs (that is SNPs that occur in at least two strains) are …
Species | Strains | Genome size | Core size | Inf. SNPs |
---|---|---|---|---|
Escherichia coli | 92 | 4,929,299 | 2,756,541 (56%) | 247,822 |
Bacillus subtilis | 75 | 4,155,843 | 2,341,553 (56%) | 182,535 |
Helicobacter pylori | 83 | 1,655,288 | 850,827 (51%) | 114,993 |
Mycobacterium tuberculosis | 40 | 4,465,985 | 4,150,139 (93%) | 3502 |
Salmonella enterica | 155 | 4,810,980 | 2,846,634 (59%) | 192,117 |
Staphylococcus aureus | 95 | 2,881,899 | 2,002,833 (69%) | 73,756 |
For each species, the table shows the SNP rate (SNPs per alignment column) , the lower bound on the number of phylogeny changes C, and the lower bound on the ratio of phylogeny changes to …
Species | SNP rate | Phyl. changes C | |
---|---|---|---|
Escherichia coli | 0.101 | 43,575 | 0.156 |
Bacillus subtilis | 0.113 | 40,811 | 0.155 |
Helicobacter pylori | 0.202 | 50,743 | 0.295 |
Mycobacterium tuberculosis | 0.002 | 755 | 0.078 |
Salmonella enterica | 0.085 | 31,598 | 0.131 |
Staphylococcus aureus | 0.053 | 13,215 | 0.124 |
Strain | Divergence |
---|---|
B2 | 0.00626 |
A7 | 0.00627 |
A11 | 0.00702 |
D8 | 0.00729 |
A2 | 0.00778 |
Length segment | Number of SNPs | SNP types |
---|---|---|
9698 | 109 | (A7, B2) (A1, A7, B2) (A2, D8) |
5672 | 100 | (A7,B2) (A1,A2) (A11,D8) |
2068 | 100 | (A2,D8) (A11,A7,B2) |
2255 | 95 | (A7,B2) (A11,A2) (A1,A11,A2) |
11726 | 95 | (A7,B2) (A2,A7,B2) (A1,D8) |
11790 | 93 | (A7,B2) (A2,D8) |
3614 | 86 | (A11,A2) |
4564 | 86 | (A11,A7,B2) (A1,A2) |
2890 | 75 | (A7,B2) (A1,A7,B2) |
2390 | 71 | (A1,A11) |
Strains | Number of occ. |
---|---|
A7 B2 | 1227 |
A11 D8 | 306 |
A2 D8 | 291 |
A11 A2 | 284 |
A1 D8 | 214 |
A1 A2 | 196 |
A1 A11 | 194 |
A1 A7 B2 | 389 |
A7 B2 D8 | 303 |
A11 A7 B2 | 265 |
A2 A7 B2 | 208 |
A11 A2 D8 | 179 |
A1 A11 D8 | 172 |
A1 A11 A2 | 161 |
A1 A2 D8 | 153 |
A1 A11 A7 B2 | 265 |
A1 A11 A2 D8 | 248 |
A11 A7 B2 D8 | 232 |
A1 A7 B2 D8 | 226 |
A1 A2 A7 B2 | 139 |
A2 A7 B2 D8 | 136 |
A11 A2 A7 B2 | 110 |