1. Microbiology and Infectious Disease
Download icon

Severe infections emerge from commensal bacteria by adaptive evolution

  1. Bernadette C Young  Is a corresponding author
  2. Chieh-Hsi Wu
  3. N Claire Gordon
  4. Kevin Cole
  5. James R Price
  6. Elian Liu
  7. Anna E Sheppard
  8. Sanuki Perera
  9. Jane Charlesworth
  10. Tanya Golubchik
  11. Zamin Iqbal
  12. Rory Bowden
  13. Ruth C Massey
  14. John Paul
  15. Derrick W Crook
  16. Timothy E Peto
  17. A Sarah Walker
  18. Martin J Llewelyn
  19. David H Wyllie
  20. Daniel J Wilson  Is a corresponding author
  1. University of Oxford, United Kingdom
  2. Oxford University Hospitals NHS Foundation Trust, United Kingdom
  3. Royal Sussex County Hospital, United Kingdom
  4. Brighton and Sussex Medical School, University of Sussex, United Kingdom
  5. NIHR Health Protection Unit in Healthcare Associated Infections and Antimicrobial Resistance at University of Oxford in partnership with Public Health England, United Kingdom
  6. University of Bristol, United Kingdom
  7. Public Health England, United Kingdom
  8. National Institute for Health Research, Oxford Biomedical Research Centre, United Kingdom
  9. Jenner Institute, United Kingdom
  10. Oxford Martin School, University of Oxford, United Kingdom
Research Article
Cite this article as: eLife 2017;6:e30637 doi: 10.7554/eLife.30637
3 figures, 3 tables, 22 data sets and 6 additional files

Figures

Figure 1 with 1 supplement
Infection-causing S. aureus form closely related but distinct populations descended from nose-colonizing bacteria in the majority of infections.

Bacteria sampled from the nose and infection site of 105 patients formed one of three population structures, illustrated with example haplotrees: (A) Unrelated populations differentiated by many variants. (B) Highly related populations separated by few variants. (C) Highly related populations with one genotype in common. Reconstructing the ancestral genotype in each patient helped identify the ancestral population: (D) Nose-colonizing bacteria ancestral. (E) Ambiguous ancestral population. (F) Infection site bacteria ancestral. (G) Phylogeny illustrating the working hypothesis that variants differentiating highly related nose-colonizing and infection-causing bacteria would be enriched for variants that promote, or are promoted by, infection. In A–F, haplotree nodes represent observed genotypes sampled from the nose (white) or infection site (grey), with area proportional to genotype frequency, or unobserved intermediate genotypes (black). Edges represent mutations. Patient identifiers and sample sizes (n) are given. In A–G, edge color indicates that mutations occurring on those branches correspond to B-class variants between nose-colonizing and infection-causing bacteria (blue), C-class variants among nose-colonizing bacteria (gold) or D-class variants among infection-causing bacteria (red). Black dashed edges indicate ancestral lineages. A B C.

https://doi.org/10.7554/eLife.30637.003
Figure 1—figure supplement 1
Distribution of the number of variants identified within 105 severely infected patients, by class.

Three classes of variants were identified: those representing genuine differences between nose-colonizing and infection populations (B-class), variants specific to the nose-colonizing population (C-class) and variants specific to the disease-causing infection population (D-class). The number of variants is shown on a piecewise-linear axis, with horizontal positioning permuted to assist visualization. Where nose-colonizing and infecting bacteria possessed different multilocus sequence types, the number of variants between those populations is colored red. When the number of B-class variants was 66 or less, nose-colonizing and infecting bacteria were considered related, since a similar range of (C-class) diversity was observed within the nose-colonizing populations of bacteria with the same multilocus sequence type. When the number of B-class variants was 1104 or more, nose-colonizing and infecting bacteria were considered unrelated.

https://doi.org/10.7554/eLife.30637.004
Figure 2 with 3 supplements
Genes, ontologies and pathways enriched for protein-altering substitutions between nose-colonizing and infection-causing bacteria within infected patients.

(A) Significance of enrichment of 2650 individual genes. (B) Significance of enrichment of 552 gene sets defined by BioCyc gene ontologies. (C) Significance of enrichment of 248 gene sets defined by SAMMD expression pathways. Genes, pathways and ontologies that approach or exceed a Bonferroni-corrected significance threshold of α = 0.05, weighted for the number of tests per category, (red lines) are named.

https://doi.org/10.7554/eLife.30637.006
Figure 2—figure supplement 1
Genes, ontologies and pathways enriched for protein-altering transient variants within nose-colonizing and infection-causing bacteria.

(A) Significance of enrichment of 2650 individual genes. SAR1461 encodes Pbp2, penicillin-binding protein 2. (B) Significance of enrichment of 552 gene sets defined by BioCyc gene ontologies. (C) Significance of enrichment of 248 gene sets defined by SAMMD expression pathways. C-class variants among nose-colonizing bacteria are colored gold, D-class variants among infection-causing bacteria are colored red. Genes, pathways and ontologies that approach or exceed a Bonferroni-corrected significance threshold of α = 0.05, weighted for the number of tests per category, (red lines) are named.

https://doi.org/10.7554/eLife.30637.007
Figure 2—figure supplement 2
Gene set enrichment analysis of B-class mutants occurring in the nose or the infection site.

Each point indicates the –log10 p-values of two tests for enrichment of protein-altering variants found among mutants in nose-colonizing bacteria vs infection-causing bacteria. The shape of each point represents the type of enrichment tested (squares: within 2650 genes in MRSA252, triangles: 552 BioCyc gene ontologies, circles: 248 SAMMD expression pathways). A line of 1:1 correspondence is plotted in red. A -log10p-value above 5.2, 4.5 or 4.2 was considered genome-wide significant for loci, gene ontologies or expression pathways, respectively.

https://doi.org/10.7554/eLife.30637.008
Figure 2—figure supplement 3
Genes, ontologies and pathways enriched for protein-altering variants among longitudinally sampled asymptomatic nasal carriers.

(A) Significance of enrichment of 2650 individual genes. (B) Significance of enrichment of 552 gene sets defined by BioCyc gene ontologies. (C) Significance of enrichment of 248 gene sets defined by SAMMD expression pathways. Genes, pathways and ontologies that approach or exceed a Bonferroni-corrected significance threshold of α = 0.05, weighted for the number of tests per category, (red lines) are named.

https://doi.org/10.7554/eLife.30637.009
Figure 3 with 2 supplements
All genes contributing to the pathways and ontologies most significantly enriched for protein-altering substitutions between nose-colonizing and infection-causing bacteria.

The pathogenesis ontology, in which significant enrichments were observed in infection-causing but not nose-colonizing bacteria, is shown for comparison. Every gene with at least one substitution between nose-colonizing and infection-causing bacteria and which was up- (red) or down- regulated (blue) in one of the pathways or a member of one of the ontologies (blue) is shown. To the left, the number of altering (yellow/orange) and truncating (pink/red) B-class variants is shown, broken down by the population in which the mutant allele was found: nose (BC; yellow/pink) or infection site (BD; orange/red).

https://doi.org/10.7554/eLife.30637.011
Figure 3—figure supplement 1
Genes enriched for substitutions between nose-colonizing and infection-causing bacteria within patients are not the most rapidly evolving at the species level.

An estimate of the dN/dS ratio between unrelated bacteria is shown for each gene, color-coded by the number of protein-altering substitutions between nose-colonizing and infection-causing bacteria within patients. There was a negative Spearman rank correlation between dN/dS ratio and substitutions within patients (ρ = –0.04, p=0.02).

https://doi.org/10.7554/eLife.30637.012
Figure 3—figure supplement 2
Gene set enrichment analysis is robust to species-level differences in dN/dS between genes.

For every locus, expression pathway and gene ontology, we estimated dN/dS between unrelated S. aureus. There was no relationship between dN/dS and enrichment of protein-altering substitutions between nose-colonizing and infection-causing bacteria in (A) loci, (B) ontologies nor (C) pathways (non-significant correlations, p>0.05). When we incorporated variability in dN/dS between genes in the gene set enrichment analyses, the results were robust for (D) loci, (E) ontologies and (F) pathways, showing only small differences in significance (-log10 p-value) between the analyses that correct for locus length only (horizontal axes) and those that correct for locus length and dN/dS (vertical axes).

https://doi.org/10.7554/eLife.30637.013

Tables

Table 1
Distribution of infection types and relatedness of nose-colonizing and infecting S. aureus among 105 patients revealed by genomic comparison.
https://doi.org/10.7554/eLife.30637.002
Infection sitesRelation of nose-colonizing to infecting bacteria
Unrelated
(≥1104 variants)
Closely related
(≤66 variants)
Zero shared genotypesOne shared genotype
Bloodstream4438
Soft tissue42310
Bone and joint283
Total107421
Table 2
Cross-classification of variants within patients by phylogenetic position and predicted functional effect, and comparison to asymptomatic nose carriers.

Neutrality indices (McDonald and Kreitman, 1991; Rand and Kann, 1996) were defined as the odds ratio of mutation counts relative to synonymous variants in patients versus asymptomatic nose carriers (Reference Panel I). Those significant at p<0.05 and p<0.005 are emboldened and underlined respectively.

https://doi.org/10.7554/eLife.30637.005
Number of variants (Neutrality index)
Phylogenetic positionSynonymousNon-synonymousProtein truncatingNon-codingTotal
Patients with severe infections (n = 105)
Between nose-colonization and infection site (B-class)93265 (1.1)39 (3.1)140 (1.2)537
Within nose-colonization
(C-class)
93325 (1.3)59 (4.7)145 (1.3)622
Within infection site
(D-class)
2682 (1.2)15 (4.3)40 (1.3)163
Total212672 (1.2)113 (3.9)325 (1.3)1322
Asymptomatic carriers (Golubchik et al., 2013) (Reference panel I, for comparison, n = 13)
Within nose-colonization
(C-class)
3797545184
Table 3
Genes, gene ontologies and expression pathways exhibiting the most significant enrichments or depletions of protein-altering B-class variants separating nose and infection site bacteria.

Enrichments below one represent depletions. The total number of variants and genes available for analysis differed by database. A -log10 p-value above 5.2, 4.5 or 4.2 was considered genome-wide significant for loci, gene ontologies or expression pathways respectively (in bold).

https://doi.org/10.7554/eLife.30637.010
Gene groupNo. protein-altering
B-class variants
Cumulative lengthof genes (kb)EnrichmentSignificance (-log10 p value)
Locus
agrA50.758.277.53
clfB52.615.874.70
Total2892363.8
BioCyc Gene Ontology (Caspi et al., 2016)
Cell wall1830.95.027.03
Cell adhesion1317.26.446.47
Pathogenesis31112.52.414.44
Total2882359.3
SAMMD Expression PathwayDown-regulatedUp-regulatedDown-regulatedUp-regulatedDown-regulatedUp-regulated
Ovispirin-1 (Pietiäinen et al., 2009)407121.2142.92.650.397.80
Temporin L (Pietiäinen et al., 2009)4214125.1156.12.780.746.86
rsp (Lei et al., 2011)27161.113.73.610.606.35
agrA (RN27) (Dunman et al., 2001)93041.085.01.832.945.57
VISA-vs-VSSA (Mu50 vs N315) (Cui et al., 2005)017034.403.955.27
VISA-vs-VSSA (Mu50 vs Mu50-P) (Cui et al., 2005)017036.703.704.90
VISA-vs-VSSA (isolate pair 2) (Howden et al., 2008)14326.959.74.060.394.71
sarA (RN27) (Dunman et al., 2001)62349.957.70.973.224.59
agrA (UAMS-1 OD 1.0) (Cassat et al., 2006)0502.7014.574.52
Pine-Oil Disinfectant-Reduced-Susceptibility (Lamichhane-Khadka et al., 2008)17536.423.63.761.704.44
Total2752093.5

Data availability

The following data sets were generated
  1. 1
The following previously published data sets were used
  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
    Reference Panel IV MRSA252
    1. Matthew Holden
    (2004)
    Publicly available at NCBI Nucleotide (accession no. BX571856.1).
  7. 7
    Reference Panel IV MSSA476
    1. Matthew Holden
    (2004)
    Publicly available at NCBI Nucleotide (accession no. BX571857.1).
  8. 8
    Reference Panel IV COL
    1. SR Gill
    (2005)
    Publicly available at NCBI Nucleotide (accession no. CP000046.1).
  9. 9
    Reference Panel IV NCTC 8325
    1. AF Gillaspy
    (2006)
    Publicly available at NCBI Nucleotide (accession no. CP000253.1).
  10. 10
    Reference Panel IV Mu50
    1. M Kuroda
    (2001)
    Publicly available at NCBI Nucleotide (accession no. BA000017.4).
  11. 11
    Reference Panel IV N315
    1. M Kuroda
    (2001)
    Publicly available at NCBI Nucleotide (accession no. BA000018.3).
  12. 12
    Reference Panel IV USA300_FPR3757
    1. BA Diep
    (2006)
    Publicly available at NCBI Nucleotide (accession no. CP000255.1).
  13. 13
    Reference Panel IV JH1
    1. A Copeland
    (2007)
    Publicly available at NCBI Nucleotide (accession no. CP000736.1).
  14. 14
    Reference Panel IV Newman
    1. T Baba
    (2008)
    Publicly available at NCBI Nucleotide (accession no. AP009351.1).
  15. 15
    Reference Panel IV TW20
    1. Matthew Holden
    (2010)
    Publicly available at NCBI Nucleotide (accession no. FN433596.1).
  16. 16
    Reference Panel IV S0385
    1. MJ Schijffelen
    (2010)
    Publicly available at NCBI Nucleotide (accession no. AM990992.1).
  17. 17
    Reference Panel IV JKD6159
    1. K Chua
    (2010)
    Publicly available at NCBI Nucleotide (accession no. CP002114.2).
  18. 18
    Reference Panel IV RF122
    1. Herron-Olson
    (2007)
    Publicly available at NCBI Nucleotide (accession no. AJ938182.1).
  19. 19
    Reference Panel IV ED133
    1. CM Guinane
    (2010)
    Publicly available at NCBI Nucleotide (accession no. CP001996.1).
  20. 20
    Reference Panel IV ED98
    1. BV Lowder
    (2009)
    Publicly available at NCBI Nucleotide (accession no. CP001781.1).
  21. 21
    Reference Panel IV EMRSA15
    1. Matthew Holden
    (2013)
    Publicly available at NCBI Nucleotide (accession no. HE681097.1).

Additional files

Supplementary file 1

List of all cultures included in the site, the site of infection (and any known source if bloodstream), number of isolates sequenced from each site, ST or CC by in silico MLST, number of variants found at each site and the mean pair-wise difference comparing isolates.

https://doi.org/10.7554/eLife.30637.014
Supplementary file 2

List of all variants found within patients with S. aureus infections, location on shared reference (MRSA252), or position and reference genome name and accession number if variant could not be localized on MRSA252.

Each variant is described by the alleles found, its location in gene, the predicted effect on gene product and the location of the variant on the phylogenetic tree.

https://doi.org/10.7554/eLife.30637.015
Supplementary file 3

Neutrality indices show signals of adaptation among the genes, gene ontologies and expression pathways most significantly enriched for protein-altering B-class variants.

Neutrality indices (NIs, 41,42) were calculated as the odds ratio of the number of protein-altering to synonymous variants among B-class versus C/D-class variants. These tests are less powerful than the Poisson regression likelihood ratio tests used to detect gene or gene set enrichment of protein-altering B-class variants (Table 3); we present them to demonstrate that the direction of enrichment was consistent with adaptation (NI > 1). To mitigate the reduced power, we calculated the expected numbers of protein-altering B-class variants from the numbers of protein-altering C/D-class variants, synonymous B-class variants and synonymous C/D-class variants by pooling them across all genes. This was justified by the absence of evidence for within-patient recombination and lack of enrichment signals among synonymous variants and C/D class protein-altering variants. A one-tailed Poisson test in R (R Core Team, 2015) was used to test NI > 1 (significant NIs at p<0.05 in bold).

https://doi.org/10.7554/eLife.30637.016
Supplementary file 4

List of all variants found within long term asymptomatic carriers, location on shared reference (MRSA252), or position and reference genome name and accession number if variant was not localized on MRSA252.

Each variant is described by the alleles found, its location in gene and the predicted effect on gene product.

https://doi.org/10.7554/eLife.30637.017
Supplementary file 5

For all ontologies showing enrichment in within-patient BD-class variants, we identified the genes with variants contributing to the signal.

We counted the number of protein-altering variants in these genes within patients, and compared to the number in long-term asymptomatic carriers. p-Values calculated using Fisher’s exact test. *Variant totals are different for SAMMD pathways (rsp, agrA, sarA) and BioCyc ontologies (cell wall, cell adhesion, pathogenesis) because pathway information is available for a different number of loci in each database.

https://doi.org/10.7554/eLife.30637.018
Transparent reporting form
https://doi.org/10.7554/eLife.30637.019

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)