(A) Simulated phylogenetic tree illustrating within-host evolution of S. aureus colonisation and infection. This model assumes two genetic bottlenecks (dotted lines); upon transmission and upon …
The tree is annotated (starting from the inner circle) with the most prevalent sequence types (ST), presence/absence of the mecA gene, compartment of isolation (colonising or invasive), and year of …
The dashed line represents the mutation threshold used to remove genetically unrelated strains with the same episode.
The scatter plots display the linear relationship between sampling time after the internal reference and number of mutations. Only episodes with at least two strains collected at at least 1 day …
(A) Distribution of the nine major ST among 2590 strains. (B) Number of independent insertion sequences (IS) insertions by ST group and type of transposase.
(A) Significance of the enrichment for protein-altering mutations. The dashed line depicts the Bonferroni-corrected significance threshold, and red circles and blue circles represent genes with p …
The maximum-likelihood phylogenetic tree was inferred from the core genome alignment of 2590 isolates. The variants are annotated based on SnpEff (*: stop codon; fs: frameshift; ext*?: stop lost).
Only the 20 most significant genes with positive selection (dN/dS for missense mutations >1) are shown.
On the x-axis are shown proportions of predicted deleterious mutations (protein-truncating substitutions with PROVEAN score <–2.5, insertion sequences [IS] insertions), the y-axis shows …
Top 20 genes with the most significant mutation enrichment across the entire dataset. (A) Significance of the enrichment for protein-altering mutations. The dashed line depicts the …
Top 20 genes with the most significant mutation enrichment across the entire dataset. (A) Significance of the enrichment for protein-altering mutations. The dashed line depicts the …
(A) Significance of the enrichment for protein-altering mutations. The dashed line depicts the Bonferroni-corrected significance threshold, and red circles and blue circles represent operons with p …
The horizontal dashed line depicts the Bonferroni-corrected significance threshold and dotted line shows the suggestive significance threshold. Labels indicate genes with significance of enrichment …
The horizontal line depicts the Bonferroni-corrected significance threshold. Genes are coloured in red if the p value is below the Bonferroni-corrected threshold and in blue otherwise. Operons are …
(A) Gene ontologies (minimum set size 10 for a total of 110 categories) ordered by normalised enrichment score (NES). Ontologies with negative enrichment were excluded. Dark blue bars indicate a …
The width and colour of the edges represent the strength of the co-occurrence of mutated genes on the same strain (thin and blue, two independent co-occurrences; thick and orange, three independent …
Adaptation was inferred by computing the Jaccard index of shared mutated genes between independent episodes, followed by network analysis of infection episodes pairs. The node centrality measure was …
Nodes indicate independent episodes, coloured based on the clinical syndrome, edges show connections based on shared mutated genes (the width of the connection is proportional to the Jaccard index).
Strains(n=2590) | Episodes(n=396) | |
---|---|---|
Sequence type | ||
30 | 342 (13.2%) | 43 (10.9%) |
22 | 277 (10.7%) | 44 (11.1%) |
5 | 271 (10.5%) | 42 (10.6%) |
45 | 198 (7.6%) | 38 (9.6%) |
15 | 156 (6.0%) | 4 (3.5%) |
1 | 133 (5.1%) | 14 (3.5%) |
93 | 110 (4.2%) | 29 (7.3%) |
8 | 107 (4.1%) | 18 (4.5%) |
239 | 100 (3.9%) | 29 (7.3%) |
Other | 896 (34.6%) | 125 (31.6%) |
mecA positive | 1001 (38.6%) | 207 (52.3%) |
Infection syndrome | ||
Skin infection | 204 (7.9%) | 32 (8.1%) |
Osteoarticular infection | 77 (3.0%) | 17 (4.3%) |
Bacteraemia without focus | 588 (22.7%) | 152 (38.4%) |
Bacteraemia with focus | 331 (12.8%) | 85 (21.5%) |
Endocarditis | 197 (7.6%) | 44 (11.1%) |
No invasive strains | 66 (16.7%) | |
Colonisation syndrome | ||
Nasal carriage | 974 (37.6%) | 166 (42%) |
Cystic fibrosis | 57 (2.2%) | 9 (2%) |
Atopic dermatitis | 162 (6.3%) | 9 (2%) |
No colonising strains | 212 (54%) |
List of within-host studies included in the analysis.
Classification of variant | Number of variants (Neutrality index) | ||
---|---|---|---|
Type C>C | Type C>I | Type I>I | |
Synonymous | 381 | 130 | 155 |
Non-synonymous | 978 | 300 (0.9) | 503 (1.3)* |
Intergenic | 544 | 197 (1.1) | 549 (2.5)** |
Truncating | 197 | 58 (0.9) | 190 (2.4)** |
Insertion sequences insertion | 17 | 6 (1.0) | 137 (19.8)** |
Large deletion | 76 | 17 (0.6)* | 122 (3.9)** |
Values are counts of independent mutations. The neutrality index is shown in brackets in italic.
Significance testing Fisher’s Exact Test: p<0.05; ** p<0.005.
The genes shown reached genome-wide significance in the entire dataset or in either colonising-colonising (type C>C), colonising-invasive (type C>I), or invasive-invasive (type I>I) variants.
Gene | p value(whole dataset) | Description | N independent mutations | Significance | ||
---|---|---|---|---|---|---|
Type C>C | Type C>I | Type I>I | ||||
agrA* | 7.04 × 10–28 | Accessory gene regulator protein A | 5** | 9** | 8** | Part of the agr quorum sensing system, which is the master regulator of virulence factors expression in S. aureus. Recurrent mutations associated with invasive disease. |
agrC** | 2.84 × 10–10 | Accessory gene regulator protein C | 4 | 2 | 6** | Histidine kinase, receptor for extracellular autoactivating peptide. Phosphorylates agrA. |
stp1** | 1.13 × 10–7 | Protein phosphatase 2 C domain-containing protein | 3 | 2 | 3 | Associated with vancomycin resistance. |
mprF** | 4.55 × 10–6 | Oxacillin resistance-related FmtC protein | 2 | 0 | 9** | Main determinant of daptomycin resistance. Association with persistence and immune evasion. |
rpoB | 7.24 × 10–3 | DNA-directed RNA polymerase subunit beta | 1 | 1 | 7** | Association with rifampicin resistance, but selection in the absence of rifampicin exposure can happen (R503H). Co-resistance to vancomycin, daptomycin, and oxacillin. Association with persistence. |
Significant enrichment (above the Bonferroni-corrected cut-off, see methods).
The genes shown reached the suggestive significance threshold in the entire dataset or in either type C>C, type C>I, or type I>I variants.
Gene | p value(whole dataset) | Description | N independent mutations | Significance | ||
---|---|---|---|---|---|---|
Type C>C | Type C>I | Type I>I | ||||
sucA* | 6.82 × 10–5 | 2-oxoglutarate dehydrogenase E1 component | 6 | 2 | 2 | Encodes a subunit of the α-ketoglutarate dehydrogenase of the tricarboxylic acid cycle. |
saeR* | 1.83 × 10–4 | DNA-binding response regulator SaeR | 2 | 1 | 2 | Regulator component of the saeRS two-component system. Virulence regulation. |
accB | 4.27 × 10–4 | Biotin carboxyl carrier protein of acetyl-CoA carboxylase | 3* | 1 | 0 | Part of the fatty acid synthesis pathway of S. aureus. |
SAUSA300_1856 | 6.41 × 10–4 | Hypothetical protein | 4* | 0 | 0 | Intracellular cysteine peptidase. Putative chaperone in S. aureus. |
xpaC | 1.38 × 10–3 | Hypothetical protein | 4* | 0 | 0 | Predicted 5-bromo-4-chloroindolyl phosphate hydrolysis protein, no data on S. aureus. |
rpsJ | 1.58 × 10–3 | 30S ribosomal protein S10 | 3* | 0 | 0 | Mutations at residues 53–60 are associated with tigecycline resistance, at no apparent fitness cost. |
SAUSA300_2399 | 1.68 × 10–3 | ABC transporter ATP-binding protein | 4* | 0 | 0 | Downregulated in the presence of fusidic acid |
walR | 2.10 × 10–3 | DNA-binding response regulator | 1 | 0 | 3* | Part of walKR two-component response regulator. Associated with vancomycin resistance. |
yjbH | 3.55 × 10–3 | Dsba-family protein | 1 | 0 | 3* | Negative regulator of spx (directs its ClpXP-dependent degradation). Association with antibiotic resistance, virulence regulation, and oxidative stress resistance. |
purR | 3.86 × 10–3 | Pur operon repressor | 0 | 1 | 3* | purR mutants: increased biofilm formation and virulence in animal model; higher capacity to invave epithelial cells. |
era | 5.34 × 10–3 | GTP-binding protein Era | 0 | 1 | 3* | Involved in ribosome assembly and stringent response. |
pbp2 | 7.75 × 10–3 | Penicillin-binding protein 2 | 6* | 0 | 0 | Role in methicillin resistance (PBP2a synergism). Increased expression after oxacillin exposure. |
fakA | 9.90 × 10–3 | Hypothetical protein | 5* | 0 | 0 | Fatty acid kinase. Deletion mutant displayed increased virulence in a murine model of skin infection. |
sgtB | 2.65 × 10–2 | Glycosyltransferase | 0 | 0 | 3* | sgtB mutations in adaptive laboratory evolution experiments upon vancomycin exposure. |
suggestive significant enrichment (above the suggestive significance cut-off, adjusted for false-discovery, see methods).
List of colonisation/infection episodes included with publication data (first author, year, PubMed id), number of strains, sites of collection, clinical characterstics, classification of colonisation, and infection episodes.
List of strains included with site and date of collection, sequence type, presence of the mecA gene, information on whether the strain was designed as internal reference or baseline index strain, mash distance to the internal reference, number of variants called (as compared to the internal reference), and sequencing metrics.
List of variants identified annotated with gene, gene sequence, FPR3757 homologue, and FPR3757 operon.
Point mutations, insertion sequences insertions, large deletions, and copy number variants are presented separately.
Gene enrichment analysis for all mutated genes with a FPR3757 homologue with number of mutations, gene length, mutation enrichment, and p value based on a Poisson regression to model the number of variants per gene.
Results are presented separately for the complete dataset and for colonising-colonising (type C>C), colonising-invasive (type C>I), and invasive-invasive (type I>I) variants.
Operon enrichment analysis for all FPR3757 operons (i.e. mutated genes that could be assigned to a FPR3757 operon) with number of mutations, operon length, mutation enrichment, and p value based on a Poisson regression to model the number of variants per operon.
Results are presented separately for the complete dataset and for colonising-colonising (type C>C), colonising-invasive (type C>I), and invasive-invasive (type I>I) variants.
Gene set enrichment analysis for mutations in genes aggregated in gene ontologies (GO) categories with enrichment score, normalsied enrichment score (NES), and unadjusted and false-discovery rate (FDR) adjusted p value.
Results are presented separately for the complete dataset and for colonising-colonising (type C>C), colonising-invasive (type C>I), and invasive-invasive (type I>I) variants.