Niche-specific genome degradation and convergent evolution shaping Staphylococcus aureus adaptation during severe infections

  1. Stefano G Giulieri
  2. Romain Guérillot
  3. Sebastian Duchene
  4. Abderrahman Hachani
  5. Diane Daniel
  6. Torsten Seemann
  7. Joshua S Davis
  8. Steven YC Tong
  9. Bernadette C Young
  10. Daniel J Wilson
  11. Timothy P Stinear  Is a corresponding author
  12. Benjamin P Howden  Is a corresponding author
  1. Department of Microbiology and Immunology at the Peter Doherty Institute for Infection and Immunity, University of Melbourne, Australia
  2. Department of Infectious Diseases, Austin Health, Australia
  3. Victorian Infectious Diseases Service, Royal Melbourne Hospital, Australia
  4. Microbiological Diagnostic Unit Public Health Laboratory, The University of Melbourne at the Doherty Institute for Infection and Immunity, Australia
  5. Department of Infectious Diseases, John Hunter Hospital, Australia
  6. Menzies School of Health Research, Charles Darwin University, Australia
  7. Victorian Infectious Disease Service, Royal Melbourne Hospital, and University of Melbourne at the Peter Doherty Institute for Infection and Immunity, Australia
  8. Nuffield Department of medicine, United Kingdom
  9. Big Data Institute, Nuffield Department of Population Health, Li Ka Shing Centre for Health Information and Discovery, Old Road Campus, University of Oxford, United Kingdom

Abstract

During severe infections, Staphylococcus aureus moves from its colonising sites to blood and tissues and is exposed to new selective pressures, thus, potentially driving adaptive evolution. Previous studies have shown the key role of the agr locus in S. aureus pathoadaptation; however, a more comprehensive characterisation of genetic signatures of bacterial adaptation may enable prediction of clinical outcomes and reveal new targets for treatment and prevention of these infections. Here, we measured adaptation using within-host evolution analysis of 2590 S. aureus genomes from 396 independent episodes of infection. By capturing a comprehensive repertoire of single nucleotide and structural genome variations, we found evidence of a distinctive evolutionary pattern within the infecting populations compared to colonising bacteria. These invasive strains had up to 20-fold enrichments for genome degradation signatures and displayed significantly convergent mutations in a distinctive set of genes, linked to antibiotic response and pathogenesis. In addition to agr-mediated adaptation, we identified non-canonical, genome-wide significant loci including sucA-sucB and stp1. The prevalence of adaptive changes increased with infection extent, emphasising the clinical significance of these signatures. These findings provide a high-resolution picture of the molecular changes when S. aureus transitions from colonisation to severe infection and may inform correlation of infection outcomes with adaptation signatures.

Editor's evaluation

This study offers a comprehensive examination of Staphylococcus aureus evolution during infection. It provides a useful analysis of select genetic signatures during bacterial adaptation. A combination of multiple layers of genome annotation and point mutation variant detection compellingly supports the correlation of infection outcomes with adaptation signatures in S. aureus.

https://doi.org/10.7554/eLife.77195.sa0

eLife digest

The bacterium Staphylococcus aureus lives harmlessly on our skin and noses. However, occasionally, it gets into our blood and internal organs, such as our bones and joints, where it causes severe, long-lasting infections that are difficult to treat.

Over time, S. aureus acquire characteristics that help them to adapt to different locations, such as transitioning from the nose to the blood, and avoid being killed by antibiotics. Previous studies have identified changes, or ‘mutations’, in genes that are likely to play an important role in this evolutionary process. One of these genes, called accessory gene regulator (or agr for short), has been shown to control the mechanisms S. aureus use to infect cells and disseminate in the body. However, it is unclear if there are changes in other genes that also help S. aureus adapt to life inside the human body.

To help resolve this mystery, Giulieri et al. collected 2,500 samples of S. aureus from almost 400 people. This included bacteria harmlessly living on the skin or in the nose, as well as strains that caused an infection. Gene sequencing revealed a small number of genes, referred to as ‘adaptive genes’, that often acquire mutations during infection. Of these, agr was the most commonly altered. However, mutations in less well-known genes were also identified: some of these genes are related to resistance to antibiotics, while others are involved in chemical processes that help the bacteria to process nutrients.

Most mutations were caused by random errors being introduced in to the bacteria’s genetic code which stopped genes from working. However, in some cases, genes were turned off by small fragments of DNA moving around and inserting themselves into different parts of the genome.

This study highlights a group of genes that help S. aureus to thrive inside the body and cause severe and prolonged infections. If these results can be confirmed, it may help to guide which antibiotics are used to treat different infections. Furthermore, understanding which genes are important for infection could lead to new strategies for eliminating this dangerous bacterium.

Introduction

While Staphylococcus aureus is one of the most important human pathogens (Tong et al., 2015), its common interaction with the human host is colonisation, usually of the anterior nares (Wertheim et al., 2005). Comparatively, severe, life-threatening infections such as bacteraemia or osteomyelitis occur very rarely. This suggests that at the macro-evolutionary level, S. aureus is primarily adapted to its natural ecological niche (the nasal cavity) and to specific selective pressures arising in this environment, such as competition with the resident microbiota (Krismer et al., 2017). By contrast, during invasive infection, a new fitness trade-off needs to be achieved to adjust to environmental challenges that include innate and acquired immune responses (Proctor, 2019), high-dose antibiotics (Kuehl et al., 2020), and nutrient starvation (Hood and Skaar, 2012). These trade-offs could occur across three potentially distinctive dynamics of micro-evolution during colonisation and infection (within the colonising population, from colonising to invasive, and within the invasive population), leading to nose-adapted, early infection-adapted, and late infection-adapted strains. Identifying infection-adapted strains might assist precision medicine strategies for infection prevention and management and refine the understanding of S. aureus pathogenesis versatility, as mutational footprints of selection mirror functions that are critically important for bacterial survival during invasion.

Emerging genomic approaches for analysis of within-host evolution are among the most powerful means to study bacterial host adaptation (Marvig et al., 2015; Kennemann et al., 2011; Lieberman et al., 2016). Studies have shown the remarkable diversity and evolution of colonising populations of Streptococcus pneumoniae (Chaguza et al., 2020) and S. aureus (Paterson et al., 2015). In S. aureus, Enterococcus faecalis and Enterococcus faecium it has been shown that transition from colonisation to invasion favours strains with specific adaptive signatures (Young et al., 2017; Van Tyne et al., 2019; Chilambi et al., 2020), while evidence of niche adaptation was noted in a within-host study of bacterial meningitis due to S. pneumoniae (Lees et al., 2017b). Furthermore, phenotypic and genomic adaptation (often in response to antibiotic pressure) has been investigated during selected episodes of persistent invasive infections due to S. aureus (Giulieri et al., 2020; Howden et al., 2006; Giulieri et al., 2018), Pseudomonas aeruginosa (Wheatley et al., 2021), Salmonella enterica (Klemm et al., 2016), and Mycobacterium tuberculosis (Vargas et al., 2021). To increase power, bacterial within-host evolution studies have leveraged on large collections of paired samples coupled with statistical models of genome-wide mutation rates (Marvig et al., 2015; Lees et al., 2017b; Gatt and Margalit, 2021) and extended the analysis to include chromosomal structural variation (Giulieri et al., 2018; Lees et al., 2017a; Gao et al., 2015) as well as intergenic mutations (Lees et al., 2017a; Khademi et al., 2019).

Convergent evolution among separated (independent) episode of colonisation or infection is a key indication of adaptation in evolution analyses. However, with the notable exception of one study of cystic fibrosis (Long et al., 2020), the convergence has generally been weak in within-host studies of S. aureus infections, with no convergence at all (Giulieri et al., 2018) or significant enrichment limited to the S. aureus master regulator agr (Young et al., 2017). We hypothesised that in addition to the small sample size, the extended range of bacterial functions potentially under selective pressure (each function being potentially targeted by diverse pathoadaptive mutations) has hampered the identification of important adaptation mechanisms. To overcome the limitations of studies to date, we have pooled all publicly available S. aureus within-host evolution studies, and complemented this with a new dataset from a recent S. aureus clinical trial (Tong et al., 2020), in a single large-scale analysis. Rather than focussing on point mutations and small insertions/deletions alone, we leveraged multiple layers of genome annotation (encompassing coding regions, operons, intergenic regions, and functional categories) and included chromosome structural variants to compile a comprehensive catalogue of bacterial genetic variation arising during host infection. This strategy enabled the detection of convergent adaptation patterns at an unprecedented resolution. We also outline distinctive signatures of adaptation during colonisation, upon transition from colonisation to infection and during invasive infection.

Results

The S. aureus within-host evolution analysis framework

We compiled a collection of 2251 S. aureus genomes from 267 independent episodes of colonisation and/or infection, reported in 24 genomic studies (Young et al., 2017; Giulieri et al., 2018; Gao et al., 2015; Young et al., 2012; Wuthrich et al., 2019; Trouillet-Assant et al., 2016; Tan et al., 2019; Suligoy et al., 2018; Rouard et al., 2018; Rishishwar et al., 2016; Petrovic Fabijan et al., 2020; Miller et al., 2020; Loss et al., 2019; Liu et al., 2020; Langhanki et al., 2018; Kuroda et al., 2019; Ji et al., 2020; Howden et al., 2011; Harkins et al., 2018; Golubchik et al., 2013; Burd et al., 2014; Benoit et al., 2018; Azarian et al., 2019; Altman et al., 2018; Table 1; Table 1—source data 1). We supplemented this dataset of publicly available sequences with unpublished sequences from 603 serial invasive isolates collected within the CAMERA-2 trial (Tong et al., 2020).

Table 1
Microbiological and clinical characteristics of the colonisation and infection episodes included in the within-host evolution analysis.
Strains(n=2590)Episodes(n=396)
Sequence type
30342 (13.2%)43 (10.9%)
22277 (10.7%)44 (11.1%)
5271 (10.5%)42 (10.6%)
45198 (7.6%)38 (9.6%)
15156 (6.0%)4 (3.5%)
1133 (5.1%)14 (3.5%)
93110 (4.2%)29 (7.3%)
8107 (4.1%)18 (4.5%)
239100 (3.9%)29 (7.3%)
Other896 (34.6%)125 (31.6%)
mecA positive1001 (38.6%)207 (52.3%)
Infection syndrome
Skin infection204 (7.9%)32 (8.1%)
Osteoarticular infection77 (3.0%)17 (4.3%)
Bacteraemia without focus588 (22.7%)152 (38.4%)
Bacteraemia with focus331 (12.8%)85 (21.5%)
Endocarditis197 (7.6%)44 (11.1%)
No invasive strains66 (16.7%)
Colonisation syndrome
Nasal carriage974 (37.6%)166 (42%)
Cystic fibrosis57 (2.2%)9 (2%)
Atopic dermatitis162 (6.3%)9 (2%)
No colonising strains212 (54%)
Table 1—source data 1

List of within-host studies included in the analysis.

https://cdn.elifesciences.org/articles/77195/elife-77195-table1-data1-v2.zip

Using genetic distance and sequence type (ST) to define within-host lineages, we estimated that coinfection was present in 4/336 (1%) of invasive episodes and co-colonisation 11/167 (7%). We removed genetically unrelated strains within the same episode and included 2590 genomes (1397 invasive and 1193 colonising) from 396 episodes in our within-host evolution analysis (Figure 1, Table 1, Supplementary files 1 and 2). The most prevalent lineages in the collection were ST 30 (342 strains, 13%), ST 22 (277 strains, 11%), and ST 5 (271 strains, 11%); 1001 strains (39%) were mecA positive. The collection was representative of the global S. aureus diversity, with an even distribution of colonising and invasive strains across the major clades (Figure 2A). The most frequent infection syndrome was bacteraemia without focus (152 episodes, 38.4%), while nasal carriage (166 episodes, 42%) was the most prevalent colonisation condition (Table 1).

Overview of the S. aureus within-host evolution analysis framework.

(A) Simulated phylogenetic tree illustrating within-host evolution of S. aureus colonisation and infection. This model assumes two genetic bottlenecks (dotted lines); upon transmission and upon transition from colonisation to invasive infection. (B) Sites and timing of within-host samples and number of genomes per sample define five prototypes of within-host evolution studies, each with colonising-colonising (C>C), colonising-invasive (C>I), or invasive-invasive (I>I) comparisons in different combinations: from top to the bottom: multiple colonising samples and one invasive samples; one colonising and one invasive sample; multiple colonising samples; multiple invasive samples; multiple colonising and invasive samples. (C) Approach to capture signals of adaptation across multiple independent episodes of colonisation/infection through detection of multiple genetic mechanisms of adaptation from short reads data and multi-layered functional annotation of the genetic variants using multiple databases including characterisation of intergenic regions (promoters), operon prediction, and gene ontology (GO). Statistical framework for the gene, operon, and gene set enrichment anlaysis (GSEA). Counts of independent mutations with likely impact on the protein sequence (non-synonymous substitutions, frameshifts, stop codon mutations, and insertion sequences [IS] insertions) were computed for each genes with a FPR3757 homologue. Gene counts (with the addition of intergenic mutations in promoter regions) were aggregated in operons and GOs. Gene and operon counts were used to fit Poisson regression models to infer mutation enrichment and significance of the enrichment. GOs counts and gene enrichment significance were used to run a gene-set-enrichment analysis. To illustrate the approach, the example of the gene walR is provided in italic.

Figure 2 with 5 supplements see all
(A) Maximum-likelihood phylogenetic tree of 2590 S. aureus sequences included in the study.

The tree is annotated (starting from the inner circle) with the most prevalent sequence types (ST), presence/absence of the mecA gene, compartment of isolation (colonising or invasive), and year of publication. (B) Summary of 396 independent episodes of S. aureus colonisation or infection categorised according to whether they allowed comparing colonising-colonising (C>C), colonising-invasive (C>I), or invasive-invasive (I>I) strains, or a combination of them. (C) Evidence of a distinctive pattern of adaptation in late infection-adapted strains (type I>I variants). For each type of comparison (type C>C, colonising-colonising; type C>I, colonising-invasive; type I>I, invasive-invasive), the cumulative curves display the accrued number of intergenic mutations, truncating mutations, insertion sequences (IS) insertions, and large deletions as a function of the total number of mutations. Genetic events were counted once per episode, regardless of the number of strains with the mutation. The sequence of mutations events in the cumulative curves is random.

Our within-host evolution analysis strategy identified 4556 genetic variants (median 3 per episode, range 0–237) (Supplementary file 3). Importantly, by investigating both point mutations and structural variation, we were able to uncover 214 large deletions (>500 bp), 160 new insertion sequences (IS) insertions, and 609 copy number variants, underscoring the role of large chromosome structural variation in within-host evolution. To increase the evolutionary convergence signal by aggregating mutations in functionally consistent categories, we annotated all genetic variants using multiple datasets, including coding sequences, regulatory intergenic regions, operons and gene ontologies (Figure 1C).

Distinctive evolutionary patterns define nose-adapted, early infection-adapted, and late infection-adapted strains

Based on the working hypothesis that S. aureus host adaptation patterns differ according to whether the strains are nose-colonising, collected at an early stage of infection (i.e. within the first 3 days) or at a late infection stage (i.e. associated with persistence beyond the first 3 days or recurrence), we assessed whether it was possible to define (i) general paradigms of genetic variation and (ii) specific convergence signatures. Thus, we classified within-host acquired variants into three groups according to their most likely location in the within-host phylogeny: (i) between colonising strains (colonising-colonising [type C>C]); (ii) between colonising and early infection adapted strains (colonising-invasive [type C>I]); and (iii) between invasive strains (invasive-invasive [type I>I]). Overall, the 396 infection episodes included in the analysis allowed us to independently assess 166 type C>C, 118 type C>I, and 312 type I>I within-host variants. In 95 cases, there were sufficient samples to assess all three types within the same episode (Figure 2B). Across colonisation/infection stages, sampling frequency did not seem to affect the number of variants identified with the exception of early adapted invasive strains (Figure 2—figure supplement 2).

We first sought to explore whether the rate of mutations over time differs between colonising and invasive populations. To estimate within-host mutation rates, we fitted linear regressions using data from a subset of 109 episodes that had at least two colonising or invasive isolates collected at two different timepoints (total 701 strains). The regressions suggested that mutation rates were higher in the invasive population; however, this analysis was limited by the heterogeneity of sampling strategies and by evidence of heteroskedasticity and non-linear distribution (Figure 2—figure supplements 3 and 4).

We have previously shown that invasive strains from persistent or relapsing infections exhibit a high proportion of protein-truncating mutations (Giulieri et al., 2018). A similar enrichment of protein-truncating variants was identified within invasive strains as compared to strains from asymptomatically colonised individuals (Young et al., 2017; Young et al., 2012). We reasoned that if this indicates genome degradation during infection, infecting strains might also be enriched for other loss of function (LOF) mutations caused by structural variants, such as movement of IS (Hawkey et al., 2020) and large deletions, leading to complete or partial gene loss (Toft and Andersson, 2010). In addition, we hypothesised that mutations and IS insertions in intergenic regions might contribute to altering gene expression or activity by interfering with the expression of key genes or operons (McEvoy et al., 2013).

Therefore, we calculated the prevalence of intergenic mutations, protein-truncating mutations, IS insertions, and large deletions among all variants and compared it between type C>C, type C>I, and type I>I variants. Strikingly, the distribution of mutations according to the predicted effect differed substantially in I>I pairs when compared to mutations identified between nose-colonising and invasive strains and within colonising strains (Figure 2). This can be expressed using the neutrality index (NI), which tests deviation from neutral evolution and is comparable to an odds ratio (Stoletzki and Eyre-Walker, 2011). Relative to type C>C variants, variants emerging within the infecting strains were enriched for intergenic mutations (NI 2.5; p=1.8 × 10–16) and protein-truncating mutations (NI 2.4; p=4.8 × 10–10) (Table 2). In contrast, no significant enrichment was observed among type C>I variants.

Table 2
Modified McDonald-Kreitman table displaying counts of variants (point mutations and structural variants) and the neutrality index for colonising-invasive (type C>I) and invasive-invasive (type I>I) variants (both compared to colonising-colonising [type C>C] variants).
Classification of variantNumber of variants (Neutrality index)
Type C>CType C>IType I>I
Synonymous381130155
Non-synonymous978300 (0.9)503 (1.3)*
Intergenic544197 (1.1)549 (2.5)**
Truncating19758 (0.9)190 (2.4)**
Insertion sequences insertion176 (1.0)137 (19.8)**
Large deletion7617 (0.6)*122 (3.9)**
  1. Values are counts of independent mutations. The neutrality index is shown in brackets in italic.

  2. Significance testing Fisher’s Exact Test: p<0.05; ** p<0.005.

While large deletions were significantly more enriched in type I>I variants (NI 4.0, p=1.1 × 10–15), the strongest evidence for enrichment (NI 19.9, p=1.6 × 10–42) was found for IS insertions. We and others have previously shown that new insertions of IS256 may provide an efficient mechanism of genomic plasticity in invasive S. aureus strains (Giulieri et al., 2018; Kuroda et al., 2019; McEvoy et al., 2013). Here, we expand this observation in a larger dataset and show that this mechanism is not limited to IS256 (Figure 2—figure supplement 5). As shown in Figure 2C, two invasive strains exhibited a burst of >10 new IS insertions (IS3 and IS256, respectively). It has been shown that IS activation occurs under stress conditions, such as antibiotic exposure and oxidative stress (Schreiber et al., 2013), which is consistent with the selection environment encountered by invasive strains. However, these bursts occurred only in 2/1068 adapted invasive strains.

Overall, these data support a model, where late infection-adapted strains show an enrichment for variants that are predicted to exert a stronger functional impact, either by producing a truncated protein or by potentially interfering with intergenic regulatory regions, through point mutations or IS insertions. This strong genome degradation signature appears to be specific to type I>I variants and was absent in type C>I variants, suggesting that the bottleneck effect upon blood or tissue invasion does not explain it. To assess whether this general enrichment of non-silent evolution represented a signature of positive selection or derived from within-host gene obsolescence occurring during invasive infection, we further investigated signals of gene, operon, and pathway specific enrichment across independent episodes of infection.

Gene enrichment analysis identifies significant hotspots of adaptation

To identify signatures of adaptation, we first counted how many times each coding sequence was mutated independently across distinct colonisation/infection episodes (Figure 3, Table 3, Table 4, Supplementary file 3). We considered all protein-modifying mutations either predicted to cause a gain or LOF to the locus: non-synonymous substitutions, truncations, IS insertions, or deletions. To ensure consistency across the dataset, we restricted our analysis to 1736 (74%) genes with a homologue in reference strain FPR3757 (excluding plasmid genes and phage genes). Mutations were considered independent if they arose in distinct colonisation/infection episodes. To assess whether the convergent signals were a reliable indication of adaptation, we applied a gene enrichment analysis for protein-altering mutations which computes a length-corrected gene-level enrichment of protein-modifying mutations. The significance of the enrichment for each gene was estimated by comparing gene-specific Poisson models of mutation counts with the null hypothesis, which indicates neutrality and assumes a constant mutation rate across all genes (Young et al., 2017).

Figure 3 with 7 supplements see all
Top 20 genes with the most significant mutation enrichment across the entire dataset.

(A) Significance of the enrichment for protein-altering mutations. The dashed line depicts the Bonferroni-corrected significance threshold, and red circles and blue circles represent genes with p values below and above the Bonferroni threshold, respectively. (B) Bar plots of independent mutations separated in three panels according to the type of variant (type C>C: colonising-colonising; type C>I: colonising-invasive; type I>I: invasive-invasive) and coloured according to the class of mutation. (C) Gene maps with type and positions of mutations.

Table 3
Genome-wide significant gene signatures of within-host evolution.

The genes shown reached genome-wide significance in the entire dataset or in either colonising-colonising (type C>C), colonising-invasive (type C>I), or invasive-invasive (type I>I) variants.

Genep value(whole dataset)DescriptionN independent mutationsSignificance
Type C>CType C>IType I>I
agrA*7.04 × 10–28Accessory gene regulator protein A5**9**8**Part of the agr quorum sensing system, which is the master regulator of virulence factors expression in S. aureus. Recurrent mutations associated with invasive disease.
agrC**2.84 × 10–10Accessory gene regulator protein C426**Histidine kinase, receptor for extracellular autoactivating peptide. Phosphorylates agrA.
stp1**1.13 × 10–7Protein phosphatase 2 C domain-containing protein323Associated with vancomycin resistance.
mprF**4.55 × 10–6Oxacillin resistance-related FmtC protein209**Main determinant of daptomycin resistance. Association with persistence and immune evasion.
rpoB7.24 × 10–3DNA-directed RNA polymerase subunit beta117**Association with rifampicin resistance, but selection in the absence of rifampicin exposure can happen (R503H). Co-resistance to vancomycin, daptomycin, and oxacillin. Association with persistence.
  1. *

    Significant enrichment (above the Bonferroni-corrected cut-off, see methods).

Table 4
Gene signatures of within-host evolution with suggestive significant enrichment.

The genes shown reached the suggestive significance threshold in the entire dataset or in either type C>C, type C>I, or type I>I variants.

Genep value(whole dataset)DescriptionN independent mutationsSignificance
Type C>CType C>IType I>I
sucA*6.82 × 10–52-oxoglutarate dehydrogenase E1 component622Encodes a subunit of the α-ketoglutarate dehydrogenase of the tricarboxylic acid cycle.
saeR*1.83 × 10–4DNA-binding response regulator SaeR212Regulator component of the saeRS two-component system. Virulence regulation.
accB4.27 × 10–4Biotin carboxyl carrier protein of acetyl-CoA carboxylase3*10Part of the fatty acid synthesis pathway of S. aureus.
SAUSA300_18566.41 × 10–4Hypothetical protein4*00Intracellular cysteine peptidase. Putative chaperone in S. aureus.
xpaC1.38 × 10–3Hypothetical protein4*00Predicted 5-bromo-4-chloroindolyl phosphate hydrolysis protein, no data on S. aureus.
rpsJ1.58 × 10–330S ribosomal protein S103*00Mutations at residues 53–60 are associated with tigecycline resistance, at no apparent fitness cost.
SAUSA300_23991.68 × 10–3ABC transporter ATP-binding protein4*00Downregulated in the presence of fusidic acid
walR2.10 × 10–3DNA-binding response regulator103*Part of walKR two-component response regulator. Associated with vancomycin resistance.
yjbH3.55 × 10–3Dsba-family protein103*Negative regulator of spx (directs its ClpXP-dependent degradation). Association with antibiotic resistance, virulence regulation, and oxidative stress resistance.
purR3.86 × 10–3Pur operon repressor013*purR mutants: increased biofilm formation and virulence in animal model; higher capacity to invave epithelial cells.
era5.34 × 10–3GTP-binding protein Era013*Involved in ribosome assembly and stringent response.
pbp27.75 × 10–3Penicillin-binding protein 26*00Role in methicillin resistance (PBP2a synergism). Increased expression after oxacillin exposure.
fakA9.90 × 10–3Hypothetical protein5*00Fatty acid kinase. Deletion mutant displayed increased virulence in a murine model of skin infection.
sgtB2.65 × 10–2Glycosyltransferase003*sgtB mutations in adaptive laboratory evolution experiments upon vancomycin exposure.
  1. *

    suggestive significant enrichment (above the suggestive significance cut-off, adjusted for false-discovery, see methods).

When applying a Bonferroni-corrected significance threshold (4.6 × 10–5), mutations in agrA were highly significantly enriched across the entire dataset (45-fold enrichment, p=7.0 × 10–28). Other significantly enriched genes were agrC (13-fold enrichment, p=2.8 × 10–10), stp1 (14-fold enrichment, p=1.1 × 10–7), and mprF (sixfold enrichment, p=4.6 × 10–6). The gene sucA reached near-significance (fivefold enrichment, p=6.8 × 10–5). Mutations in genes most significantly targeted by convergent evolution were evenly distributed across the S. aureus phylogeny, indicating that these adaptative mechanisms were not specific to selected lineages (Figure 3—figure supplement 1). Using dN/dS analysis, we confirmed signatures of positive selection in the most significantly enriched genes, although only agrA reached statistical significance (Figure 3—figure supplement 2 and Supplementary file 4).

We found that several genes with the most significant enrichment (agrA, agrC, stp1, and sucA) were recurrently mutated across all three within-host evolutionary scenarios, implying a global role in S. aureus adaptation during colonisation and invasion. This suggests partial adaptation of S. aureus strains upon invasion. It has been previously shown that adaptive mutations, particularly within the quorum sensing accessory gene regulator (agr), are enriched in invasive S. aureus strains, suggesting that adapted strains are more prone to be involved in invasive disease (Young et al., 2017; Young et al., 2012; Altman et al., 2018; Smyth et al., 2012). While the key role of agr was consistent with previous evidence from clinical and experimental studies, the high number of recurrent sucA mutations was surprising. This metabolic gene encodes for the α-ketoglutarate dehydrogenase of the tricarboxylic acid cycle, and recent work has revealed the functional basis of its potential role in adaptation. Its inactivation was found to lead to a persister phenotypye (Wang et al., 2018), and sucA was a hotspot of metabolic adaptation to antibiotics in a recent in vitro evolution study (Lopatkin et al., 2021).

Adaptive mutations can cause both loss or gain of function of the gene affected; however, LOF is thought to be most frequent consequence of adaptation (Behe, 2010). We inferred LOF when variants lead to protein truncation (including intragenic IS insertions) or non-synonymous substitutions were expected to be deleterious based on the degree of divergence in conserved sites of the protein Choi et al., 2012. The majority (63%) of aggregated variants in adaptive loci were predicted to lead to LOF (Figure 3—figure supplement 3), either because of accumulation of truncating mutations (agrA and agrC) or exclusively because of deleterious substitutions (walR and vraG). Because only LOF is inferred from the sequence, gain of function variants might be miss-classified as neutral (or even deleterious). We considered this hypothesis for genes with low frequency of truncations and expected adaptive advantage of gain of function mutations based on the literature. For example, mprF mutations associated with vancomycin or daptomycin resistance have been shown to be associated with increased enzymatic activity of the protein leading to decreased negative charge of the membrane (Ernst and Peschel, 2019). Among within-host acquired mprF mutations in our dataset, we found that four of them were previously described and associated with antibiotic resistance through a gain of function mechanism (Ernst and Peschel, 2019). A distinctive sign of these variants was convergence at position or mutation level, which has been described as a potential hallmark of gain of function mutations (Gerasimavicius et al., 2021). Further supporting the evidence for gain of function, mprF was duplicated in one of the episodes, as reported previously (Gao et al., 2015), while no other convergent gene was affected by within-host copy number variants. We also assessed accB because no variant in this genes was classified as deleterious; however, accB mutations have been previously described as LOF in strains that are auxotroph for fatty acids (Morvan et al., 2016). Thus, based on in silico prediction and previous data, we hypothesise that variants in convergent loci were mostly expected to be LOF with the exception of mprF.

To confirm that our gene enrichment analysis (focused on point mutations and IS insertions and limited to genes with FPR3757 homologues) captured the large part of adaptation, we analysed variation due to large deletions and copy number variation, which were not included in the gene enrichment analysis. We observed multiple independent deletions and amplifications mainly in phage genes (Figure 3—figure supplement 4 and Figure 3—figure supplement 5). We also repeated the gene enrichment analysis with all mutated genes (with and without FPR3757 homologues) and found very similar results, with only two hypothetical proteins with no FPR3757 homologue among the genes with most significant enrichment (Figure 3—figure supplement 6).

Combining multiple mechanisms of adaptation and multi-layered annotation increases the signal of convergent evolution

To increase our ability to capture signatures of adaptation from convergent evolution, we extended our analysis beyond coding sequences, to integrate the genetic variation signals issued from intergenic mutations and IS insertions in intergenic regions. This multi-layered annotation of mutated regions was shown to increase the amount of information gained from in vitro adaptive evolution experiments (Phaneuf et al., 2020). Such methodology allows for an advanced classification of intergenic mutations based on regulatory sequences including promoters and transcription units based on data acquired from RNAseq experiments (Mäder et al., 2016; Prados et al., 2016).

Using this approach, we were able to assign 150/1237 (11%) of all intergenic mutations and IS insertions to a predicted regulatory region. We found that the agr, sucAB, and walKR operons had the strongest convergent evolution signal with 28, 13, and 13 independent mutations (Supplementary file 5). Mutations within these loci were significantly (agr, 12-fold enrichment, p=1.1e-21; sucAB, fourfold enrichment, p=4.5e-5) or near-significantly enriched (walKR, threefold enrichment, p=1.7e-4) (Figure 4). Interestingly, promoter mutations represented 2/13 (15%) of the walKR operon mutations, indicating that potentially impactful intergenic variants may be missed when considering only coding regions. Furthermore, new IS insertions were found within type I>I variants: three insertions into agrC (predicted to inactivate the gene, as shown previously in staphylococci [Both et al., 2021; Suligoy et al., 2020]) and an insertion 159 bp upstream of walR, in a region encompassing its cognate promoter. Together with the strong enrichment for IS insertions within type I>I variants, the location of these insertions in recurrently mutated operons suggests that IS insertions contribute to the adaptive evolution of S. aureus during invasive infection.

Top 20 operons with the most significant mutation enrichment across all dataset.

(A) Significance of the enrichment for protein-altering mutations. The dashed line depicts the Bonferroni-corrected significance threshold, and red circles and blue circles represent operons with p values below and above the Bonferroni threshold, respectively. (B) Bar plots of independent mutations separated in three panels according to the type of variant (type C>C: colonising-colonising; type C>I: colonising-invasive; type I>I: invasive-invasive) and coloured according to the class of mutation. Mutations were considered independent if they occurred in separate episodes of either colonisation or invasive infection. (C) Operon maps with positions of the mutations (relative to the start of the first gene of the operon). Operons are labelled with the names of the genes included, and longer labels were shorted for clarity (see Supplementary file 5 for details).

Adaptation within the invasive population is distinctive and strongly driven by antibiotics

The excess of non-silent evolution (and potentially function-altering) within invasive strains suggested that strong, specific selection pressure occurs within the invasive populations (type I>I variants). We therefore assessed genes that appeared to be specifically mutated or inactivated during infection. We performed our gene- and operon-enrichment analysis for each type of within-host variants separately (i.e. within the colonising population, between colonising and invasive strains, and within the invasive population) (Figure 5). We found that agrA mutations were highly enriched in any group of variants, and particularly prevalent between colonising and invasive strains (type C>I variants), consistent with a previous study that is included in this analysis (Young et al., 2017). Among type I>I variants (between invasive strains), a significant enrichment was observed in mprF (18-fold enrichment, p=2.8 × 10–9), agrC (24-fold enrichment, p=2.1 × 10–7), and rpoB (10-fold enrichment, p=8.8 × 10–6). Other genes that were strongly enriched in type I>I variants (below the Bonferroni-corrected threshold, but above the suggestive significance threshold, Figure 5) included walR (22-fold enrichment, p=3.5 × 10–4), stp1 (20-fold enrichment, p=4.2 × 10–4), yjbH (19-fold enrichment, p=5.4 × 10–4), sgtB (19-fold enrichment, p=5.5 × 10–4), and purR (18-fold enrichment, p=5.8 × 10–4).

Figure 5 with 2 supplements see all
Modified volcano plot displaying enrichment (x-axis) and significance of enrichment (y-axis) within colonising-colonising (type C>C), colonising-invasive (type C>I), and invasive-invasive (type I>I) variants.

The horizontal dashed line depicts the Bonferroni-corrected significance threshold and dotted line shows the suggestive significance threshold. Labels indicate genes with significance of enrichment below the suggestive threshold. Genes are coloured in red if the p value is below the Bonferroni-corrected threshold and in blue otherwise.

The enrichment for mutations in mprF, rpoB, stp1, sgtB, and in the walKR/yycH operon (11-fold enrichment, p=9 × 10–9, see Figure 5—figure supplement 1 for the operon enrichment analysis) highlights the role of antibiotic pressure in shaping adaptation within the invasive population, since these loci are hotspots of adaptation to key anti-staphylococcal antibiotics that are often used in invasive infections (rifampicin, daptomycin, and vancomycin). For example, the essential two-component regulator walKR/yycFG (and its associated genes walH/yycH) have been shown to have a key role in vancomycin resistance in one of the within-host evolution studies included in this analysis (Howden et al., 2011), while mutations in both stp1 and sgtB have been observed in vancomycin-adapted strains (Machado et al., 2021).

Notably, the most significant gene signatures in invasive strains might have been selected in response to other selective pressures, including the host immune response during infection. For example, rpoB mutations have been associated with pleiotropic effects, including co-resistance to vancomycin, daptomycin, and oxacillin and immune evasion, suggesting a potential role in adaptation beyond the response to the selective pressure from rifampicin (Guérillot et al., 2018). This hypothesis is supported by the presence of mutations (such as the rpoB R503H substitution and N405 inframe deletion) outside the rifampicin-resistance determining region.

Pleiotropic phenotypes are also likely to underlie the enrichment of yjbH with invasive strains, which was mutated four times (of which three were truncations), yet only one mutation was found in colonising strains or early infection-adapted strains. This gene has a cysteine-rich domain that is homologous to dsbA in Escherichia coli. One of its roles in S. aureus is to facilitate the ClpXP-dependent degradation of the transcriptional regulator Spx (Austin et al., 2019). Inactivation of yjbH has been associated with oxacillin (Göhring et al., 2011) and vancomyin (Renzoni et al., 2011) resistance, impaired growth (Engman et al., 2012), and reduced virulence in animal models (Paudel et al., 2021), indicating that yjbH mutations may influence both host-pathogen interaction and antibiotic resistance. Finally, purR, a purine biosynthesis repressor, has been recently characterised beyond its metabolic function: interestingly, it was shown to be a virulence regulator (Sause et al., 2019), where purR mutants displayed higher bacterial counts following mice infection, increased biofilm formation (Goncheva et al., 2019), and higher capacity to invade epithelial cells (Goncheva et al., 2020).

We performed a gene set enrichment analysis (GSEA), using gene ontology and antibiotic resistance gene annotations (Feldgarden et al., 2019). The GSEA, stratified by variant type, showed significant enrichment only in type I>I variants, further underscoring the higher level of adaptation in this group (Figure 5—figure supplement 2 and Supplementary file 6) and confirmed the broad functional implications of the most enriched genes and operons with the invasive populations, since among the ontologies that were significantly enriched within the invasive population, we found the categories ‘DNA binding’ (normalised enrichment score [NES]=1.6, false discovery rate [FDR]-adjusted p=9 × 10–4), ‘pathogenesis’ (NES = 1.7, adjusted p=4 × 10–3), and ‘antibiotic response’ (NES = 1.8, adjusted p=7 × 10–3).

Taken together, these findings point to six key genetic loci that appear to have an important role in S. aureus adaptation during invasive infections. These loci are associated with either antibiotic resistance (mprF, rpoB, stp1, sgtB, and walKR), pathogenesis (agrAC and purR), or both (yjbH).

A mutation’s co-occurrence network defines loci under within-host co-evolutionary pressure

Epistasis, defined as the interaction of multiple mutations on a given phenotype (Levin-Reisman et al., 2019), plays a role in adaptive evolution in bacteria, particularly in antibiotic resistance (Skwark et al., 2017; Wadsworth et al., 2018; Yokoyama et al., 2018). Whether epistatic interactions could promote S. aureus adaptation during infection remains unknown. Identifying these interactions would enable identification of combinations of mutations underlying bacterial adaption during infection and refine the prediction of infection outcomes. Here, we assessed co-occurrence of mutations and mutated genes across independent episodes of colonisation/infection. While co-occurrence may simply result from co-selection (e.g. simultaneous exposure to two different antibiotics), it may also indicate putative epistatic interactions that could be explored in terms of potential impact on adaptive phenotypes (Phillips, 2008).

First, we explored co-occurrence of mutations and found only one case where the same mutations co-occurred in more than one independent episode. The two mutations were an inframe deletion within hypothetical protein SAUSA300_2068 and a A60D substitution of the gene ywlC. These genes are closely located in FPR3757. While the co-occurrence could be explained by recombination, recombination is expected to be rare amongst within-host S. aureus populations in general (Golubchik et al., 2013) and even rarer within invasive strains. YwlC is a threonylcarbamoyl-AMP synthase in E. coli, thus it is possible that SAUSA300_2068 is also a ribosomal protein. Ribosomal proteins can display regulatory activity (Aseev and Boni, 2011) and could plausibly be targets of adaptation to both antibiotics and the host/intracellular survival. This specific case of convergent co-occurrence of mutations was detected within type I>I variants.

When assessing interactions at gene level (i.e. co-occurrence of the same altered protein sequences across independent episodes), we found the strongest interaction between the agrA and agrC genes (Figure 6). While this is consistent with the high convergence of mutations in the agr locus, this suggests that strains acquire multiple mutations within the locus, possibly further impacting agr activity. Interestingly, no convergent co-occurrence signature compatible with possible epistasis was observed within the walKR locus, the other operon with a high number of independent mutations; which could be due to the essentiality of walKR in S. aureus (Monk et al., 2019). Collectively, agr locus mutations interacted with 17 other mutated genes, the strongest interaction being with stp1. Since stp1 (a serine/threonine phosphatase) has been previously associated with virulence regulation (Cameron et al., 2012), this interaction potentially indicates another mechanism by which adapted strains fine-tune the gene expression profile that is already altered by agr mutations.

Network of mutations co-occurrence.

The width and colour of the edges represent the strength of the co-occurrence of mutated genes on the same strain (thin and blue, two independent co-occurrences; thick and orange, three independent co-occurrences).

Another moderately strong interaction was observed between rpoB and parC, which were co-mutated in three independent episodes. Given the association of parC mutations with fluoroquinolone resistance (Trong et al., 2005), this interaction is likely to be an example of co-selection due to co-exposure to fluoroquinolones and rifampicin.

Clinical correlates of adaptive signatures within colonising and invasive populations

Genetic signatures of bacterial adaptation have been associated with infection extent, for example, enabling the prediction of extraintestinal infection with S. enterica (Wheeler et al., 2018). We have previously shown that adaptive mutations are enriched in invasive infections; however, it is unclear whether bacterial adaptation is more likely to be associated with distinctive clinical syndromes. To identify clinical correlates of adaptive signatures, we classified colonisation and infection episodes based on the sites of collection and on clinical data obtained from the publications (Table 1 and Figure 7—figure supplement 1). We then used the Jaccard index and network analysis to compute node centrality as a global measure of adaptation for each independent episode. The Jaccard index can be used as a simple marker of the proportion of shared mutated genes between pairs of colonisation or infection episodes (Bailey et al., 2017). Node centrality allows to similtuaneously take into account the strength of similarity between independent episodes (Jaccard index) and the number of pairs with shared mutated genes (number of connections). Hence, a node centrality of 0, indicates that the episode does not share any mutated genes with other episodes and appears as isolates node on the adaptation network (Figure 7—figure supplement 2). Here, we limited the analysis to the 20 most significantly enriched genes with each type of variant.

Our network analysis showed that adaptation was present in only a minority of episodes within each type of variant (Figure 7—figure supplement 2). With a definition of adaptation based on a centrality value of more than 0, we found that the proportion of adaptive episodes was 43, 20, and 22% with type C>C, C>I, and I>I variants, respectively. In addition, certain clinical syndromes were more strongly associated with adaptation. Within the colonising population (type C>C variants), almost 80% of cystic fibrosis episodes were adaptive, as opposed to one third of episodes of skin colonisation in atopic dermatitis (Figure 7AB). This is consistent with within-host evolution studies showing strong convergent evolution signals among bacterial populations colonising individuals with cystic fibrosis, not only in case of S. aureus colonisation (Long et al., 2020) but also P. aeruginosa (Marvig et al., 2015) and Mycobacterium abscessus (Bryant et al., 2021); however, one study found adaptive evolution signals in atopic dermatitis (Key et al., 2021). We also observed that adaptation among infection episodes correlated with infection extent. Episodes of infective endocarditis episodes displayed higher adaptation metrics (46% with centrality >0) than bacteraemia with additional infection foci (28%) and bacteraemia without focus (17%) (Figure 7D–E).

Figure 7 with 2 supplements see all
Clinical correlates of adaptive signatures within colonising (colonising-colonising [type C>C,] panels A–C) and invasive (invasive-invasive [type I>I], panels D–F) bacterial populations.

Adaptation was inferred by computing the Jaccard index of shared mutated genes between independent episodes, followed by network analysis of infection episodes pairs. The node centrality measure was used as an indicator of adaptation. To avoid overinflation of mutated genes, the calculation was limited to the 20 most significantly enriched genes within each group of mutations. (A, D) Density of centrality values across colonisation (panel A) and infection categories (panel D). (B, E) Number and proportion of adaptive episodes. An adaptive episode was defined by a centrality >0. (C, F) Distribution of mutations in the 20 most significantly enriched genes across categories of colonisation (panel C) and infection (panel F).

To explore the syndrome-specificity of adaptation signatures, we mapped mutations in the most significantly enriched adaptive genes to clinical syndromes of colonisation and infection (Figure 7 panels C and F). As expected, syndromes with high prevalence of adaptation had higher numbers of episodes with adaptive mutations; however, some genes appeared to be preferentially mutated. For example, rpsJ, stp1, and SAUSA300_1230 were over-represented in cystic fibrosis, while no clear pattern of mutations was discernible for nasal carriage episodes. Within infection syndromes, mprF and purR mutations were more prevalent in endocarditis, and yjbH mutations were only found in severe infections (bacteraemias with additional foci and endocarditis). Some genes appeared to be distinctive for low adaptation groups (atopic dermatitis and skin infections); however, the low number of adaptative mutations prevented an accurate assessment of these profiles.

Discussion

Within-host evolution of bacterial pathogens such as S. aureus is thought to be governed by a combination of positive selection for variants that confers an advantage within the host and random fixation of mutations (genetic drift) (Klemm et al., 2016; Didelot et al., 2016). Sudden changes in the effective population size (bottlenecks) can cause genetic drift, for example, when a single or few bacterial cells invade the bloodstream or when a secondary infection foci is established in tissues and organs. Consistent with this view, animal studies have shown that after infecting the blood with a polyclonal population, bacteraemic infection is established stochastically by a single clone (McVicker et al., 2014; Guerillot et al., 2018), although estimating the bottleneck size at invasion from clinical sequences has been more challenging (Abel et al., 2015). On the other hand, several lines of evidence support the role of positive selection and adaptive evolution during S. aureus infection. First, adaptive phenotypic features appear to be acquired during infection. The most obvious adaptative phenotype is secondary resistance to antistaphylococcal antibiotics such as rifampicin, vancomycin (Howden et al., 2006), daptomycin (Peleg et al., 2012), and oxacillin (Giulieri et al., 2020). Crucially, these resistance phenotypes can be associated with pleiotropic, pathoadaptative phenotypes such as small colony variant and immune evasion (Guérillot et al., 2018; Jiang et al., 2019; Guérillot et al., 2019). Furthermore, phenotypic adaptation (e.g. loss of toxicity) has been observed upon transition from colonisation to infection (Laabei et al., 2015), supporting the concept that invasive infection is linked to pathoadapted strains. At the molecular level, an excess of protein-truncating mutations in invasive strains (Giulieri et al., 2018) and in late colonising strains leading to infection (Young et al., 2012) have been noted. While this observation alone could be explained by relaxed constraint resulting from reduced population size (Didelot et al., 2016), it has been suggested that loss of gene function might be a common adaptation mechanism of within-host evolution (Gatt and Margalit, 2021), as supported by evidence of gene- or pathway-specific enrichment of mutations across independent infection episodes (Young et al., 2017).

Despite support for adaptive evolution from previous studies, it has been difficult to identify specific molecular signatures of adaptation during infection, due to the limited power of previous within-host studies of bacteraemia and other serious S. aureus infections that were often limited to a restricted number of episodes. To increase our ability to identify signatures of adaptation and find significantly enriched loci, we analysed multiple sources of genetic variation (point mutations, large deletions, IS insertions, and copy number variants) in a large collection of independent episodes of S. aureus colonisation and infection from 25 studies. We predicted that the main advantage of our approach would be to increase the ability of detecting convergence of genetic variants arising during invasive infections as opposed to those detected during the colonisation and upon transition from colonisation to infection. To test this hypothesis, we classified within-host variants based on their likely position in the within-host phylogeny (Figure 1D).

Bacterial adaptation is promoted by genomic plasticity; however, within-host evolution is characterised by low genetic variation (Giulieri et al., 2018). Based on our previous genomic studies of S. aureus bacteraemia, we reasoned that chromosome structural variants may provide an additional mechanism to increase genetic variation during infection. Here, we found that new insertions of IS are strongly enriched during invasive infection. However, despite this 20-fold enrichment, IS insertions remained a rare source of variation even within invasive strains and appeared to have a selective contribution to adaptation (i.e. limited to specific loci such as agrC). Similarly, large deletions and copy number variants appeared to play a less prominent role in adaptation, although we did not include them in our enrichment analysis.

Together with the enrichment for LOF mutations, which is another feature of evolution within the invasive population in our analysis, these IS movements suggest that a pattern of reductive evolution (genome degradation through loss of genes or accumulation of LOF mutations) emerges during within-host evolution of invasive S. aureus. This genome degradation might be related to less effective purifying selection (loss of deleterious alleles) in the invasive population due to a decrease in effective population size and a shorter evolutionary timescale (Didelot et al., 2016). However, our data indicate that these changes converge to specific genes, operons, and pathways, suggesting an adaptative benefit. Reductive evolution has been described in several ‘commensal-to-pathogen’ settings (Toft and Andersson, 2010). Although niche adaptation through reductive evolution has been described in extracellular pathogens (Stinear et al., 2007), a smaller (reduced) genome is a hallmark of obligate intracellular endosymbiotic bacteria (Batut et al., 2014). Since it appears that invasive S. aureus is able to reside intracellularly (promoting dissemination through mobile phagocytes during S. aureus sepsis [Thwaites and Gant, 2011; Surewaard et al., 2016], and immune evasion), it is plausible that this pattern of reductive evolution reflects an adaptation of invasive S. aureus to an invasive and intracellular lifestyle, as it has been shown for other facultative intracellular pathogens such as non-typhoidal Salmonella (Klemm et al., 2016). However, it is possible that these signatures of reductive evolution might be only temporary, as genome degradation might be present only in a minority of strains or be reversible; moreover, LOF mutations are expected to be more likely than gain of function mutations.

Beside reductive evolution, another distinctive feature of within-host evolution during invasive infection was intergenic mutations (both point mutations and IS insertions). In a within-host evolution study of S. pneumoniae colonisation, it was shown that intergenic sites were under convergent evolution (Chaguza et al., 2020). Mutations in promoter sequences of some core genes can play an important role in antibiotic resistance as it was repeatedly shown for pbp4 and resistance to beta-lactam antibiotics (Basuino et al., 2018). The role of intergenic mutations in within-host evolution was shown in a study of P. aeruginosa infection, where convergent evolution targeted several intergenic regulatory regions including upstream of antibiotic resistance genes (Khademi et al., 2019).

Previous work on within-host evolution by our group (included in this analysis) has established that agrA mutations are significantly enriched upon transition from colonisation to infection (Young et al., 2017). In addition, we have shown through genomics and targeted mutagenesis that mutations in key genes such as walKR (Howden et al., 2011) and rpoB (Gao et al., 2013) play a key pathoadaptive role in selected cases of persistent S. aureus infections. In this study, we increased our ability to discover potential targets of adaptation by analysing several mechanisms of genetic variation and applying several layers of annotation. As compared to previous work on S. aureus, this approach provides a higher-resolution picture of within-host evolution and adaptation. Importantly, this analysis remains robust after removing more than 1000 sequences from the largest within-host study included (Figure 3—figure supplement 7). We increase here the generalisability of our findings. We expand the list of genes targeted by convergent evolution and show that there are distinctive adaptation pathways in colonising and invasive populations. We confirm that the dominantly mutated loci belonged to the agr locus, in particular agrA and agrC. This finding is consistent with a large body of literature that predated the genomic era (Novick and Geisinger, 2008) that supports the role of the agr locus as the master regulator of gene expression in S. aureus. Agr-mediated adaptation was so important that we found a highly significant enrichment of agr mutations arros all type variants, including within colonising strains (type C>C variants). Shopsin et al. showed that ~10% of healthy S. aureus carriers held an agr-defective strain and that prior hospitalisation was significantly associated with agr-defective status, suggesting prior adaptive pressures (Shopsin et al., 2008).

Consistent with the distinctive general patterns of evolution displayed during invasive infection, some genes and loci were specifically mutated within invasive strains. Some of these genes were emerging targets of S. aureus pathogenesis in vivo, such as purR and yjbH that were not singled out in previous within-host evolution investigations. Others were known determinants of antibiotic resistance, including mprF, rpoB, and the walKR operon. This underscores the crucial role of antibiotic exposure in shaping adaptive evolution during invasive infection. A recent study of within-host evolution during cystic fibrosis found that resistance genes were hotspots of convergent evolution in this population, which is frequently treated with antibiotics and shows features of phenotypic adaptation (Long et al., 2020).

While most mutations in adaptive loci were substitutions within the coding sequence, about 40% of walKR operon mutations were located outside of the coding regions of walR and walk, emphasising the need to study intergenic mutations and mutations throughout an operon to capture adaptive signatures. This observation highlights the importance of expanding the analysis of intergenic mutations for the detection of adaptive mutations, in particular those linked to antibiotic resistance.

If within-host evolution represents adaptative evolution, it is possible that adaptation involved an accumulation of mutations and possibly epistatic mechanisms. Our data show that some mutations are specific for invasive strains; these mutations may reflect late adaptation, occurring after evolution during colonisation in the nose and upon transition from colonisation to infection, and thus occurring after adaptive mutations were acquired during earlier stages. This evolutionary pattern of stepwise adaptation (or adaptive continuum) encompassing the entire within-host evolutionary arch has been well described for cancer (Abbosh et al., 2017) and has been also investigated in a study of transition from colonisation to infection (Young et al., 2012). One way to capture this is mutation co-occurrence analysis. Here, we show mutation co-occurrence within the agr operon but also co-occurrence of the same mutations in two uncharacterised proteins in S. aureus. Our network of mutation co-occurrence linked to agrAC mutations might suggest a potential pathway of stepwise adaptation following initial mutations acquisition in the agr locus, an hypothesis that has been explored in one of studies included in our analysis (Altman et al., 2018).

While combining multiple studies allowed us to increase statistical power in order to detect genome-wide convergent signals of adaptation, this approach has some limitations. The quality of the publicly available sequences and metadata can be heterogeneous, despite performing quality control assessment, for example, due to different read coverage across studies. Lack of consistent metadata might have impacted the clinical categorisations used here, including the distinction between colonising and invasive strains (however, 83% of the strains could be classified unambiguously based on the site of collection). In addition, detection of structural variants from short reads is not as accurate as from long reads; for example, chromosomal inversions can be missed if their inversion site span is larger than the insert site of the paired-end reads (Guérillot et al., 2019). Furthermore, the majority of the episodes included had low sampling density. More contemporary strategies leverage a multiplicity of samples and deep sequencing strategies to capture within-host diversity, allowing to detect minority variants that could be relevant for adaptation and to obtain a more accurate understanding of evolutionary dynamics within the host, including estimating bottlenecks (Hall et al., 2019) and tracking the fall and rise of within-host lineages (Bryant et al., 2021). Finally, while several adaptive loci identified here have been previously assessed experimentally, the functional impact of adaptive variants in less characterised genes and intergenic mutations warrants further exploration using targeted mutagenesis (Monk et al., 2015).

Ultimately, the goal of detecting adaptive signals is to identify new mechanisms of pathogenesis or resistance to therapeutic targets and to inform prediction of clinical outcomes. So far, studies have failed to show consistent associations between specific clinical outcomes and genetic features of infecting (or colonising) S. aureus strains. This might be related to the dominance of host/environmental factors, but it could also be linked to the limited power of studies performed so far. By contrast, within-host evolution studies are an elegant approach to identify signatures of adaptation that might be candidate markers of important clinical outcomes, such as infection risk in case of colonisation or treatment failure in case of infection. Here, we show that adaptation signatures are at least partially specific to colonisation, infection, and upon transition from colonisation to infection and that adaptive changes are more frequent in distinctive infection episodes (complicated bacteraemia and endocarditis). These findings suggest that adaptive signatures might be indicative of important clinical outcomes. In the future, precision medicine in infectious diseases could follow the lead of cancer genomics, where within-host evolution studies have tracked the evolution of cancer clones and enabled the detection of high risk mutations early.

Materials and methods

Literature search

Request a detailed protocol

We conducted a search of articles indexed in PubMed before the 11 August 2020 using the keyword ‘aureus’ in combination with either ‘genomics’ or ‘whole genome sequencing’ and with either ‘within-host evolution’, ‘in vivo evolution’, ‘adaptation’, or ‘bacteraemia’. The records retrieved through this search were combined with additional citations identified through other sources. After removing duplicates, this resulted in 815 citations that were screened based on following inclusion criteria: (i) whole-genome sequencing of human S. aureus isolates; (ii) >1 S. aureus isolates sequenced per individual; (iii) sequences (reads or assemblies) publicly available; and (iv) minimum sequences metadata available (either with the manuscript or linked to the sequences): patient ID, date of collection (or collection interval in reference to a baseline isolate), and source of the sample. After excluding studies not satisfying the inclusion criteria (730 based on the title, 46 based on the abstract, and 15 after reviewing the full text), we kept 24 within-host evolution studies.

Extraction of sequence metadata

Request a detailed protocol

For each of the included studies, the following variables were extracted either from reads, metadata (when available) or from the publication/supplementary data: identifier linking the sequences to a patient or an episode of infection, date of collection (when available) or collection interval in reference to a baseline isolate, and site of collection of the isolate. Isolates were broadly categorised in ‘colonising’ and ‘invasive’ based on the site of collection, when the information was unambiguous (e.g. ‘nose’ for ‘colonising’ or ‘blood’ for ‘invasive’). When the information on the body site was not sufficient (e.g. ‘skin’ or ‘lung’), the categorisation was based on further details provided in the publication. When available, phenotypic metadata and antibiotic treatments were also extracted from the publication. We used clinical and site data to classify colonisation episodes in ‘nasal carriage’, ‘atopic dermatitis’, and ‘cystic fibrosis’ and infection episodes in ‘skin infection’ (skin infection site surgical site infection without other foci), ‘osteoarticular infection’ (bone/join infection without other foci), ‘bacteraemia without focus’ (bloodstream infection, no other foci, and expect for vascular catheter or skin), ‘bacteraemia with focus’ (bloodstream infection with other focus involving the lung, nervous system, bone and joints, or internal organs), and ‘endocarditis’ (based on diagnosis reported in the publication or in the clinical metadata).

Sequence processing

Request a detailed protocol

Sequences (reads and assemblies) and metadata were downloaded from the European Nucleotide Archive and the National Center for Biotechnology Information (NCBI), respectively using the BioProject accession or the genome accession. Quality assessment of the reads was performed by calculating mean read depth and the fraction of S. aureus reads using Kraken 2, v2.0.9-beta (Wood et al., 2019) and by extracting metrics from reads assemblies constructed using Shovill, v1.1.0 (https://github.com/tseemann/shovill, Seemann, 2022c) and annotated using Prokka, v1.14.6 Seemann, 2014 . ST was inferred from the assembly using Mlst, v2.19.0 (https://github.com/tseemann/mlst, Seemann, 2022b), and resistance genes were detected using Abricate, v1.0.1 (https://github.com/tseemann/abricate, Seemann, 2022a). Reads were discarded if the mean coverage depth was below 35, the majority of reads were not S. aureus, or the size of the assembly was below 2.6 megabases. Assemblies downloaded from the NCBI repository were discarded if the genome size was below 2.6 megabases.

Sequences from the CAMERA2 trial

Request a detailed protocol

We collected S. aureus strains from bacteraemia episodes included in the CAMERA2 trial (Combination Antibiotics for Methicillin Resistant S. aureus), where at least two strains per episode were available. The CAMERA2 trial was performed between 2015 and 2018 in Australia, New Zealand, Singapore, and Israel and randomised participants with methicillin-resistant S. aureus bacteraemia to either monotherapy with vancomycin or daptomycin or combination therapy with vancomycin or daptomycin plus an antistaphylococcal beta-lactam (flucloxacillin, cloxacillin, or cefazolin) (Tong et al., 2020). Strains were isolated from –80C glycerol onto horse-blood agar. Species were confirmed using matrix-assisted laser desorption/ionization time-of-flight mass spectrometry. Bacterial whole-genome sequencing was performed from single colonies on the Illumnina NextSeq platform. Reads were checked for quality, assembled, and annotated as described above.

Global phylogeny

Request a detailed protocol

To generate a global alignment of all sequences, reads and shredded assemblies were mapped to reference genome USA300 FPR3757 (assembly accession: GCF_000013465.1) (using Snippy, v4.6.0) (https://github.com/tseemann/snippy; Seemann, 2022d). The core genome alignment was obtained using Snippy; sites with >10% gaps were removed using Goalign (Lemoine and Gascuel, 2021) and constant sites were removed using SNP-sites (Page et al., 2016), for a final length of 186,825 bp. A maximum-likelihood phylogenetic tree of 2590 sequences (those kept in the analysis after excluding genetically unrelated strains, see below) was inferred using IQ-TREE, v2.0.3 Minh et al., 2020.

Variant calling

Request a detailed protocol

We have previously shown that the accuracy of variant calling in within-host evolution analyses is improved when mapping reads to an internal draft assembly as opposed to a closely related closed genome (Giulieri et al., 2018). Here, we applied the same approach, where we selected the internal reference among the sequences from the same patient or episode. When available, the oldest colonising strain was selected. When only sequences from invasive strains were available, the oldest strain (baseline strain) was selected. When multiple sequences were available per sample (e.g. multiple colonies sequenced per plate) or at the same date, the reference was randomly selected among them. Snippy with default parameters (minimum reads coverage 10, minimum read mapping quality 60, and minimum base quality 13) was used for variant calling. To further improve the accuracy of the calls, we masked variants called from reference reads and those at positions where reference reads had a coverage below 10 (using the BEDTools suite [Quinlan and Hall, 2010]).

Filtering of genetically unrelated sequences

Request a detailed protocol

The threshold for removing genetically unrelated sequences was set empirically at 100 episode-specific variants based on the upper Tukey’s fence of the distribution of the number of variants in same-episode isolates belonging to the same ST (Figure 2—figure supplement 1).

Estimation of within-host mutation rates

Request a detailed protocol

To estimate within-host mutation rates within colonising and invasive populations, a linear regression was fitted to model the relationship between sampling time (in years after the first isolate) and number of mutations relative to the internal reference. Only episodes with at least two strains collected at least one day apart were included in this analysis. The mutation rate μ was computed as follows μ = β/g, where β is the regression parameter and g is the mean genome size of the internal references (2.79 Mb). Regression diagnostics were performed using the R package performance (Lüdecke et al., 2021).

Detection of chromosome structural variants

Request a detailed protocol

Using BWA-MEM (Li, 2013), reads and shredded contigs were aligned to the closest available complete genome (either internal to the dataset or selected from the NCBI repository based on the mash distance). To detect large deletions ( ≥ 500 bp), reads coverage was computed using BEDTools, as described in Giulieri et al., 2018. To detect new IS insertions, split reads were extracted, filtered, and annotated as described in Giulieri et al., 2018. We used the R package CNOGpro (Brynildsrud et al., 2015) to detect 1000 bp windows with an estimated copy number above one as compared to the internal reference. The package calculates the reads coverage per sliding windows of the chromosome, performs a G+C bias normalisation, and infers copy number state using a Hidden Markow Model. We ran the package with default parameters, with the exception of the length of the sliding window that was set at 1000 bp. For each class of structural variant and within each episode, we used BEDTools to mask regions where the variant was already present in internal reference.

Prediction of functional impact of variants

Request a detailed protocol

Functional impact of variants was extracted from the Snippy output, which uses SnpEff to infer the functional effect of the detected mutations (Cingolani et al., 2012). SnpEff categories for coding regions were aggregated in ‘truncating’ (frameshift, stop codons, and start codons), ‘non-synonymous substitutions’, and ‘synonymous substitutions’. Non-synonymous substitutions were further investigated using PROVEAN, v1.1.5 (58) using the non-redundant protein database provides on the PROVEAN repository (ftp://ftp.jcvi.org/pub/data/provean/nr_Aug_2011/). Substitutions were classified as ‘deleterious’ if the PROVEAN score was –2.5 or less and ‘neutral’ otherwise.

Internal variant annotation

Request a detailed protocol

To ensure a consistent annotation of mutated genes across independent episodes, we clustered amino-acid sequences using CD-HIT, v4.8.1 with an identity threshold of 0.9. The BEDTools suite was used to annotate mutated intergenic regions with upstream and downstream coding regions and the distance separating the mutation from the start or the end of the gene. For the operon analysis, intergenic mutations were classified according to their location within a presumed promoter based on blasting the sequence of unique promoters (as determined in [Prados et al., 2016]) on the draft assembly of the internal reference. Phage genes were annotated using blastp and the PhageWeb database (http://computationalbiology.ufpa.br/phageweb/).

Variant annotation using reference strain FPR3757

Request a detailed protocol

To compare mutated genes across separated episodes, we used blastp to identify homologues of each CD-HIT cluster of mutated genes in USA300 FPR3757. Genes in FPR3757 were further annotated using the database provided in the AureoWiki repository (Fuchs et al., 2018), and operon annotations of FPR3757 were retrieved from Microbes Online (Dehal et al., 2010). In addition, we used the text mining tool PaperBLAST to search for publications containing data on homologues of uncharacterised FPR3757 proteins (Price et al., 2017). Only protein-altering variants in genes with FPR3757 homologues (excluding plasmid genes and phage genes) were kept for the analysis of convergence at gene and operon level and the gene enrichment analysis.

Classification of variants

Request a detailed protocol

Mutational and structural variants were classified in to type C>C (within colonising strains), type C>I (between colonising and invasive strains), and type I>I (within invasive strains) as follows: all variants arising in colonising strains were classified as type C>C, while variants among invasive strains were classified as type C>I if they were found in a baseline invasive strain (defined as the oldest invasive strain; when multiple sequences were available at same time, the baseline invasive strain was selected randomly), and as type I>I if they were found between invasive strains but not on the baseline invasive strain. This approach is based on the assumption that co-infection or superinfection is rare, as we have shown previously for bacteraemia (Giulieri et al., 2018).

Calculation of the Neutrality Index (NI)

Request a detailed protocol

A modified McDonald-Kreitman table was compiled a described in Stoletzki and Eyre-Walker, 2011, where a ratio was calculated between non-synonymous, protein-truncating, IS insertions, intergenic and deletion variants, and synonymous variants. The NI was obtained by dividing the ratio calculated above for type C>I and type I>I by the ratio for type C>C variants that were used as reference group. Significance was tested by Fisher’s Exact test.

dN/dS analysis

Request a detailed protocol

We used the R package dNdScv (Martincorena et al., 2017) to obtain dN/dS ratios for non-synonymous mutations, indels, and missense mutations (stop codons) for all FPR3757 genes, based on variants called when mapping all reads on FPR3757 and after subtracting variants from the internal reference reads and variants in positive where internal reference reads had a low coverage. Since this analysis could be hampered by potential false-positive variants resulting from mapping reads on a single reference (Giulieri et al., 2018), we also used our curated list of within-host mutations obtained from episode-specific variant calling to calculate crude dN/dS ratios by dividing the number of protein modifying mutations by the number non-synonymous mutations and computed p values by Fisher exact test as in Long et al., 2020.

Gene and operon enrichment analysis

Request a detailed protocol

We calculated the enrichment of protein-altering mutations across all coding regions of FPR3757 (excluding plasmid genes and phage genes) using the approach described in Young et al., 2017. The variant enrichment per gene i was calculated as follows: (Ni/Li)/(Σnl), where Ni is the number of variants per gene i, Li is the length of gene i, Σn is the total number of variants, and Σl is the total length of the genes. We used Poisson regression to model the number of variants per gene j under the null hypothesis (no enrichment), as defined by the equation λ0Lj, where λ0 is the expected number of variants in any gene and Lj is the gene length. Under the alternative hypothesis (enrichment of variant in gene i), the estimated number of variants is λiLi for gene i, and λ1Lj for any other gene j. The model parameters λ0, λ1, and λI were obtained using maximum likelihood and tested for significance using the likelihood ratio test. The genome-wide significance cut-off was calculated using the Bonferroni correction (0.05 divided by the number of unique genes or operons) and the suggestive significance cut-off (1 divided by the number of unique genes or operons), as implemented for bacterial genome-wide associated studies in Lees et al., 2017a.

Gene set enrichment analysis

Request a detailed protocol

We used the PANNZER platform (Törönen et al., 2018), to retrieve a gene ontology annotation of FPR3757 based on the GO terms. We modified the ‘antibiotic response’ category by adding a curated list of antibiotic resistance genes downloaded from the NCBI Anti-Microbial Resistance (AMR) gene reference database (Feldgarden et al., 2019). The GSEA was performed as implemented in the R package clusterProfile (Yu et al., 2012). Genes with a FPR3757 homologue were ranked according to the significance of the enrichment of protein-modifying mutations (gene enrichment analysis, see above), and the GSEA was carried out with a minimum gene set size of 10 and using the FDR method for adjustment for multiple testing.

Mutation co-occurrence analysis

Request a detailed protocol

To detect co-occurrence of mutations and mutated genes across independent episodes, we constructed a co-occurrence matrix using the R package co-occur (Griffith et al., 2016). A co-occurrence of mutations or mutated genes in at least two independent episodes was interpreted as convergent and as a sign of potential epistatic interaction. The network of co-occurrence of mutated genes was visualised using the R package ggraph (https://cran.r-project.org/web/packages/ggraph/index.html).

Network analysis of adaptation signatures

Request a detailed protocol

The pairwise calculation of the Jaccard index between set of mutated genes was performed in R. The calculations were performed both with the entire set of mutated FPR3757 genes and with the 20 most significantly enriched genes in each group of variants. A network of shared mutated genes between independent episodes was constructed using ggraph, where edges represented episode connections based on the Jaccard index. We used the R package tidygraph to extract the node centrality (function centrality_degree) as a summary measure of the degree of adaptation of the episodes. The network graph and analysis were performed for each group of variants separately.

Data availability

All data generated or analysed during this study are included in the manuscript and supporting file 1-6. The code to call, filter and annotated within-host variants and to perform the enrichment analysis is available on github at https://github.com/stefanogg/staph_adaptation_paper, (copy archived at swh:1:rev:6ec5132855405e1c759fedadb4c70e295c1c1974).

The following data sets were generated
    1. Giulieri SG
    2. Daniel D
    3. Seemann T
    4. Davis JS
    5. Tong SYC
    6. Stinear TP
    7. Howden BP
    (2022) European Nucleotide Archive
    ID PRJEB50796. Within-host evolution analysis of S. aureus bacteraemias included in the CAMERA-2 trial.
The following previously published data sets were used
    1. Gao W
    2. Monk IR
    3. Tobias NJ
    4. Gladman SL
    5. Seemann T
    6. Stinear TP
    7. Howden BP
    (2015) NCBI BioProject
    ID PRJEB9193. Large tandem chromosome duplications facilitate niche adaptation during persistent infection with drug-resistant Staphylococcus aureus.
    1. Howden BP
    (2011) NCBI Sequence Read Archive
    ID SRA027352. Evolution of Multidrug Resistance during Staphylococcus aureus Infection Involves Mutation of the Essential Two Component Regulator WalKR.
    1. Howden BP
    (2011) NCBI BioProject
    ID PRJNA29567. Evolution of Multidrug Resistance during Staphylococcus aureus Infection Involves Mutation of the Essential Two Component Regulator WalKR.
    1. Howden BP
    2. McEvoy CR
    3. Allen DL
    4. Chua K
    5. Gao W
    6. Harrison PF
    7. Bell J
    8. Coombs G
    9. Bennett-Wood V
    10. Porter JL
    11. Robins-Browne R
    12. Davies JK
    13. Seemann T
    14. Stinear TP
    (2011) NCBI BioProject
    ID PRJNA29569. Evolution of Multidrug Resistance during Staphylococcus aureus Infection Involves Mutation of the Essential Two Component Regulator WalKR.
    1. Burd EM
    2. Alam MT
    3. Passalacqua KD
    4. Kalokhe AS
    5. Eaton ME
    6. Satola SW
    7. Kraft CS
    8. Read TD
    (2014) NCBI BioProject
    ID PRJNA248678. Development of oxacillin resistance in a patient with recurrent Staphylococcus aureus bacteremia.
    1. Rishishwar L
    2. Kraft CS
    3. Jordan IK
    (2016) NCBI BioProject
    ID PRJNA259799. Population Genomics of Reduced Vancomycin Susceptibility in Staphylococcus aureus.
    1. Trouillet-Assant S
    2. Lelièvre L
    3. Martins-Simões P
    4. Gonzaga L
    5. Tasse J
    6. Valour F
    7. Rasigade JP
    8. Vandenesch F
    9. Muniz Guedes RL
    10. Ribeiro de Vasconcelos AT
    11. Caillon J
    12. Lustig S
    13. Ferry T
    14. Jacqueline C
    15. Loss de Morais G
    16. Laurent F
    (2016) NCBI BioProject
    ID PRJNA298748. Adaptive processes of Staphylococcus aureus isolates during the progression from acute to chronic bone and joint infections in patients.
    1. Rouard C
    2. Garnier F
    3. Leraut J
    4. Lepainteur M
    5. Rahajamananav L
    6. Languepin J
    7. Ploy MC
    8. Bourgeois-Nicolaos N
    9. Doucet-Populaire F
    (2018) NCBI BioProject
    ID PRJNA434495. Emergence and Within-Host Genetic Evolution of Methicillin-Resistant Staphylococcus aureus Resistant to Linezolid in a Cystic Fibrosis Patient.
    1. Langhanki L
    2. Berger P
    3. Treffon J
    4. Catania F
    5. Kahl BC
    6. Mellmann A
    (2018) NCBI BioProject
    ID PRJEB22600. In vivo competition and horizontal gene transfer among distinct Staphylococcus aureus lineages as major drivers for adaptational changes during long-term persistence in humans.
    1. Altman DR
    (2018) NCBI BioProject
    ID PRJNA393749. Genome Plasticity of agr-Defective Staphylococcus aureus during Clinical Infection.
    1. Giulieri SG
    (2018) NCBI BioProject
    ID PRJEB27932. Genomic exploration of sequential clinical isolates reveals a distinctive molecular signature of persistent Staphylococcus aureus bacteraemia.
    1. Benoit JB
    2. Frank DN
    3. Bessesen MT
    (2018) NCBI BioProject
    ID PRJNA414752. Genomic evolution of Staphylococcus aureus isolates colonizing the nares and progressing to bacteremia.
    1. Suligoy CM
    2. Lattar SM
    3. Noto Llana M
    4. González CD
    5. Alvarez LP
    6. Robinson DA
    7. Gómez MI
    8. Buzzola FR
    9. Sordelli DO
    (2018) NCBI BioProject
    ID PRJNA414566. Mutation of Agr Is Associated with the Adaptation of Staphylococcus aureus to the Host during Chronic Osteomyelitis.
    1. Harkins CP
    2. Pettigrew KA
    3. Oravcová K
    4. Gardner J
    5. Hearn RMR
    6. Rice D
    7. Mather AE
    8. Parkhill J
    9. Brown SJ
    10. Proby CM
    11. Holden MTG
    (2018) NCBI BioProject
    ID PRJEB20148. The Microevolution and Epidemiology of Staphylococcus aureus Colonization during Atopic Eczema Disease Flare.
    1. Tan X
    (2019) NCBI BioProject
    ID PRJNA446073. Chronic Staphylococcus aureus Lung Infection Correlates With Proteogenomic and Metabolic Adaptations Leading to an Increased Intracellular Persistence.
    1. Loss G
    2. Simões PM
    3. Valour F
    4. Cortês MF
    5. Gonzaga L
    6. Bergot M
    7. Trouillet-Assant S
    8. Josse J
    9. Diot A
    10. Ricci E
    11. Vasconcelos AT
    12. Laurent F
    (2019) NCBI BioProject
    ID PRJNA497214. Staphylococcus aureus Small Colony Variants (SCVs): News From a Chronic Prosthetic Joint Infection.
    1. Kuroda M
    2. Sekizuka T
    3. Matsui H
    4. Ohsuga J
    5. Ohshima T
    6. Hanaki H
    (2019) NCBI BioProject
    ID PRJDB8056. IS256-Mediated Overexpression of the WalKR Two-Component System Regulon Contributes to Reduced Vancomycin Susceptibility in a Staphylococcus aureus Clinical Isolate.
    1. Azarian T
    2. Ridgway JP
    3. Yin Z
    4. David MZ
    (2019) NCBI BioProject
    ID PRJNA527261. Long-Term Intrahost Evolution of Methicillin Resistant Staphylococcus aureus Among Cystic Fibrosis Patients With Respiratory Carriage.
    1. Wüthrich D
    2. Cuénod A
    3. Hinic V
    4. Morgenstern M
    5. Khanna N
    6. Egli A
    7. Kuehl R
    (2019) NCBI BioProject
    ID PRJNA488707. Genomic characterization of inpatient evolution of MRSA resistant to daptomycin, vancomycin and ceftaroline.
    1. Ji S
    2. Jiang S
    3. Wei X
    4. Sun L
    5. Wang H
    6. Zhao F
    7. Chen Y
    8. Yu Y
    (2020) NCBI BioProject
    ID PRJNA577181. In-Host Evolution of Daptomycin Resistance and Heteroresistance in Methicillin-Resistant Staphylococcus aureus Strains From Three Endocarditis Patients.
    1. Miller CR
    2. Dey S
    3. Smolenski PD
    4. Kulkarni PS
    5. Monk JM
    6. Szubin R
    7. Sakoulas G
    8. Berti AD
    (2020) NCBI BioProject
    ID PRJNA544229. Distinct Subpopulations of Intravalvular Methicillin-Resistant Staphylococcus aureus with Variable Susceptibility to Daptomycin in Tricuspid Valve Endocarditis.
    1. Petrovic Fabijan A
    2. Lin RCY
    3. Ho J
    4. Maddocks S
    5. Ben Zakour NL
    6. Iredell JR
    (2020) NCBI BioProject
    ID PRJNA541589. Safety of bacteriophage therapy in severe Staphylococcus aureus infection.

References

Decision letter

  1. Bavesh D Kana
    Senior and Reviewing Editor; University of the Witwatersrand, South Africa
  2. Daria Van Tyne
    Reviewer; University of Pittsburgh, United States
  3. Meiqin Zheng
    Reviewer

Our editorial process produces two outputs: (i) public reviews designed to be posted alongside the preprint for the benefit of readers; (ii) feedback on the manuscript for the authors, including requests for revisions, shown below. We also include an acceptance summary that explains what the editors found interesting or important about the work.

Decision letter after peer review:

Thank you for submitting your article "Niche-specific genome degradation and convergent evolution shaping Staphylococcus aureus adaptation during severe infections" for consideration by eLife. Your article has been reviewed by 2 peer reviewers, and the evaluation has been overseen by a Reviewing Editor and Bavesh Kana as the Senior Editor. The following individuals involved in the review of your submission have agreed to reveal their identity: Daria Van Tyne (Reviewer #1); Meiqin Zheng (Reviewer #2).

The reviewers have discussed their reviews with one another, and the Reviewing Editor has drafted this to help you prepare a revised submission.

Essential revisions:

1. One weakness of the study is a lack of a classification of the variants detected in convergent loci. For example, which genes do the authors think are acquiring gain-of-function versus loss-of-function mutations? One other weakness is a lack of functional studies exploring some of the more novel signals detected (such as hypothetical proteins with "no data on S. aureus"). Can the authors comment or provide data on this?

2. Line 130-131: Was there any association between sampling frequency and variant accumulation in patients between early vs. late adapted populations? Can the available data be used to estimate mutation rates or bottleneck sizes in moving from colonization->early infection->late infection?

Other comments that must be addressed

1. Line 78: Reference 13 is a study in E. faecalis, not E. faecium. Chilambi et al., 2020 PNAS examine colonization vs. infection in VREfm.

2. Lines 143-145: Please define "early stage," "late infection stage," "persistence," and "treatment failure" here. These terms are not defined in the methods or anywhere else in the paper.

3. Line 153 should reference Figure 2B.

4. Line 183 should reference Figure 2 —figure supplement 2.

5. Lines 207-208: How many genes had a homolog in the reference strain? What percentage of the genome was this?

6. Was the definition of "colonization" vs. "infection" consistent across all studies compiled here? If not this should be acknowledged and discussed.

7. Figure 1: Panels B and D are somewhat confusing. Consider revising panel D, or moving it to the supplement.

8. Figure 3 and others like it: the triangles corresponding to "2" vs. "3" independent acquisitions do not look noticeably different on the figures.

9. Figure 3 —figure supplement 1: what does "ext*?" mean?

10. Line 416: what is the reference base of the definition about adaptation based on a centrality value of more than 0?

11. Line 712-714 and 722-723: These two sentences could be integrated into one.

12. In general, the article should be written with greater readability for the generalist audience of eLife.

13. Please check the whole article again and modify some mistakes such as Line 240, line 259, line 289, line 784, Figure 3 —figure supplement 5, and so on.

https://doi.org/10.7554/eLife.77195.sa1

Author response

Essential revisions:

1. One weakness of the study is a lack of a classification of the variants detected in convergent loci. For example, which genes do the authors think are acquiring gain-of-function versus loss-of-function mutations?

We agree with the reviewer that our classification system doesn’t account for the predicted functional impact of amino acid substitutions. This is especially important for convergent loci where most or all aggregated mutations are substitutions such as mprF and sucA. To address this shortcoming, we used PROVEAN to predict the impact of amino acid substitution on the protein function (lines 915-923). We used a δ score of -2.5 to differentiate between deleterious and neutral substitutions. The results are presented in Figure 3 – supplement 3 (new) and in the manuscript (lines 295-322). The output of PROVEAN is also provided with Supplementary file 3.

Based on in silico predictions (protein-truncations, low PROVEAN scores) we hypothesise that most convergent loci carry loss-of-function mutations. However, gain-of-function mutations are hard to predict from the sequence. We reviewed the literature on genes with low prevalence of truncations and higher PROVEAN scores and only confirmed gain-of-function mutations for mprF.

One other weakness is a lack of functional studies exploring some of the more novel signals detected (such as hypothetical proteins with "no data on S. aureus"). Can the authors comment or provide data on this?

Following the reviewer’s suggestion, we have performed a review of hypothetical proteins: we searched for homologs and functional domains and used PaperBlast to identify relevant papers (lines 947-948). We have revised tables 3/4 and added some relevant references that can provide more background information on the novel adaptive loci. We also agree with the reviewer that functional studies of candidate adaptive loci of unknown function will be important. Detection of previously described pathoadaptive loci such as agr, stp1, saeR, vraR and walR (Figure 3), underscores the validity of our approach. However, determining roles for genes in transition from colonising to invasive (and other pathoadaptive phenotypes) is non-trivial and of course represent substantial, stand-alone studies. We are planning to conduct these investigations and we hope that our work will encourage others to also dive into such studies (see also discussion, lines 750-754).

2. Line 130-131: Was there any association between sampling frequency and variant accumulation in patients between early vs. late adapted populations?

We assessed the correlation between number of samples and mean mutation counts per episode and stage of infection (colonisation, early, late). No correlation was found, except for early infections, where there was a weak association between number of sequences and mean mutation counts (R2 = 0.042, p = 0.001). This analysis is presented in Figure 2 —figure supplement 2 and mentioned in the manuscript (lines 172-175).

Can the available data be used to estimate mutation rates or bottleneck sizes in moving from colonization->early infection->late infection?

We investigated mutation counts and rates within colonising and invasive populations by fitting a linear regression of mutation counts in response to the collection time. The regression suggests that mutation rates are higher in the invasive population, however this analysis should be taken with caution given that we are using multiple studies and sampling strategies and that the models displayed heteroskedasticity and non-linear distribution (results, lines 177-184 and methods 886-893). As hypothesised by the reviewer, this is a consequence of the heterogenous sampling approaches across the studies included in the analysis. Because of these limitations, the regression is presented in the supplementary data (Figure 2 —figure supplement 3 and 4).

We agree with the reviewers’ that there is an interest in inferring bottleneck size at invasion from clinical data. Data from animal experiments suggest that there is a narrow bottleneck at the transition from colonisation to infection and upon organ seeding during bacteraemia. These analyses have used molecular markers and experimental design to infer the size of the bottleneck in animal models of bacteraemia and are therefore difficult to replicate with clinical sequences. Alternatively, the bottleneck size can be calculated using population genomics approaches, provided sufficient diversity and sampling depth is present, as it is the case in deep sequencing studies of viral infections. A few studies have also used deep sequencing approaches to calculate the size of the transmission bottleneck in bacterial infections. In a study by Hall et al., (eLife 2019;8:e46402), the size of the transmission bottleneck was assessed using a Bayesian phylogenetic method in six colonisation pairs that fulfilled strict criteria in terms of statistical support and sampling depth (95% posterior trees and > 5 tips [i.e. individual sequences] supporting the direction of transmission). Our dataset does not provide sufficient sampling coverage to apply one of the above approaches. The rate of coinfections with genetically distant lineages can be used to obtain a first rough estimate. Using this approach, we found evidence of coinfection for 4/336 (1%) episodes with at least two invasive sequences obtained at < 3 days interval. By contrast evidence of co-colonisation was found in 11/167 (7%) episodes with at least two colonising sequences obtained at < 3 days interval. This seems to confirm the prevailing opinion that there is a narrow bottleneck at invasion, however more subtle coinfections are missed with this approach and as mentioned above the estimate is heavily dependent on the density of sampling and sequencing depth. We have added the coinfection estimates to the Results section of the manuscript (lines 127-129) and mentioned the issues around the estimation of the bottleneck size in the discussion (lines 567-569 and lines 745-750).

Other comments that must be addressed

1. Line 78: Reference 13 is a study in E. faecalis, not E. faecium. Chilambi et al., 2020 PNAS examine colonization vs. infection in VREfm.

We have corrected the mistake and added the reference on within-host evolution of VRE (lines 75-77).

2. Lines 143-145: Please define "early stage," "late infection stage," "persistence," and "treatment failure" here. These terms are not defined in the methods or anywhere else in the paper.

We have added a definition for these terms (lines 161-162). In addition, we replace the term “treatment failure” with recurrence. Persistence and recurrence are the commonly used to define microbiological failure in Staphylococcus aureus bacteraemia, the main syndrome analysed in our study. When revising the manuscript, we have elected not to use the term “treatment failure” as it is not well defined and may include outcomes that are not necessarily linked to bacterial adaptation.

3. Line 153 should reference Figure 2B.

We have modified the figure reference (line 171).

4. Line 183 should reference Figure 2 —figure supplement 2.

We have carefully reviewed figure references for IS insertions: 1) Figure 2 —figure supplement 5 (Figure 2 —figure supplement 2 in the older version) is referenced at line 220, to illustrate the diversity of IS families involved. 2) Figure 2C is referenced in the following sentence, at lines 220-221, to specify that the burst of IS insertions is visible when considering the accumulation of IS insertions within invasive strains (type I>I) in panel C. 2) We hope that this clarifies how the two IS insertions figures are references in the text.

5. Lines 207-208: How many genes had a homolog in the reference strain? What percentage of the genome was this?

A total of 1,736 genes had a FPR3757 homolog (74%). We have added this information to the result section (lines 254-255).

6. Was the definition of "colonization" vs. "infection" consistent across all studies compiled here? If not this should be acknowledged and discussed.

The categorisation of “colonising” or “invasive” was based the following 2-steps assessment of the metadata (lines 799-803): 1) We first checked if the collection site was unambiguously associated with one of the two categories (e.g. “nose”: colonising; “blood” or “bone”: invasive); 2) For sites that could be associated with both colonisation or infection (e.g. “lung” or “skin”) we considered the available details provided in the publication including the categorisation provided by the authors and the clinical histories, if available. While 2,151 sequences (83%) were unambiguous and could be assessed at the first step, we acknowledge that this second categorisation relies on the authors’ assessment. This is now mentioned in the discussion (lines 739-742).

7. Figure 1: Panels B and D are somewhat confusing. Consider revising panel D, or moving it to the supplement.

Following the reviewers’ suggestions we have modified figure 1: panel C is now a “workflow” figure showing the within-host evolution analysis process. We have replaced panel D with a simplified version of the statistical model we used.

8. Figure 3 and others like it: the triangles corresponding to "2" vs. "3" independent acquisitions do not look noticeably different on the figures.

We have modified the figures to improve the readability of panel C (Figure 3, Figure 3 —figure supplements 6 and 7, Figure 4).

9. Figure 3 —figure supplement 1: what does "ext*?" mean?

This is the SnpEff annotation for a stop_lost&splice_region_variant. We have now added an explanation to the legend of Figure 3 —figure supplement 1 (lines 1137-1138).

10. Line 416: what is the reference base of the definition about adaptation based on a centrality value of more than 0?

A centrality value of 0 means that the episode (node) doesn’t have any connection to other episodes. Since connections are based on the Jaccard index, this means that a node with a centrality value of 0 doesn’t share any mutated genes with other episodes. In other words, within-host mutations for these nodes are only found in non-convergent loci (lines 520-522).

11. Line 712-714 and 722-723: These two sentences could be integrated into one.

We have modified this paragraph as suggested (lines 911-913).

12. In general, the article should be written with greater readability for the generalist audience of eLife.

We agree with the reviewer that readability is important, but this is a very broad criticism and therefore somewhat difficult to address. We have re-read the manuscript carefully with an eye to accessibility to a broad scientific audience. In this vein, we preface each section of the results with a rationale for a particular experiment, we define key terms and try to summarise the particular method(s) being deployed. Examples are at lines 177-184 (brief explanation of the regression analysis of mutation rates), lines 186-195 (rationale for the investigation of genome degradation signatures) and lines 208-209 (definition of the neutrality index); lines 295-299 (rationale and approach to infer functional consequences of substitutions); lines 452-464 (rationale and approach to the epistasis analysis); lines 503-523 (rationale and approach to the adaptation network analysis). We have made every effort to explain our implementation of leading-edge statistical and computational techniques as clearly as possible (examples lines 257-265).

13. Please check the whole article again and modify some mistakes such as Line 240, line 259, line 289, line 784, Figure 3 —figure supplement 5, and so on.

We have revised the manuscript and corrected mistakes.

https://doi.org/10.7554/eLife.77195.sa2

Article and author information

Author details

  1. Stefano G Giulieri

    1. Department of Microbiology and Immunology at the Peter Doherty Institute for Infection and Immunity, University of Melbourne, Melbourne, Australia
    2. Department of Infectious Diseases, Austin Health, Heidelberg, Australia
    3. Victorian Infectious Diseases Service, Royal Melbourne Hospital, Melbourne, Australia
    Contribution
    Conceptualization, Data curation, Formal analysis, Methodology, Visualization, Writing – original draft, Writing – review and editing
    Competing interests
    No competing interests declared
  2. Romain Guérillot

    Department of Microbiology and Immunology at the Peter Doherty Institute for Infection and Immunity, University of Melbourne, Melbourne, Australia
    Contribution
    Conceptualization, Formal analysis, Methodology, Software, Supervision, Writing – review and editing
    Competing interests
    No competing interests declared
  3. Sebastian Duchene

    Department of Microbiology and Immunology at the Peter Doherty Institute for Infection and Immunity, University of Melbourne, Melbourne, Australia
    Contribution
    Resources, Software, Writing – review and editing
    Competing interests
    No competing interests declared
  4. Abderrahman Hachani

    Department of Microbiology and Immunology at the Peter Doherty Institute for Infection and Immunity, University of Melbourne, Melbourne, Australia
    Contribution
    Investigation, Writing – review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-8032-2154
  5. Diane Daniel

    1. Department of Microbiology and Immunology at the Peter Doherty Institute for Infection and Immunity, University of Melbourne, Melbourne, Australia
    2. Microbiological Diagnostic Unit Public Health Laboratory, The University of Melbourne at the Doherty Institute for Infection and Immunity, Melbourne, Australia
    Contribution
    Data curation, Investigation, Project administration
    Competing interests
    No competing interests declared
  6. Torsten Seemann

    Microbiological Diagnostic Unit Public Health Laboratory, The University of Melbourne at the Doherty Institute for Infection and Immunity, Melbourne, Australia
    Contribution
    Methodology, Software, Supervision
    Competing interests
    No competing interests declared
  7. Joshua S Davis

    1. Department of Infectious Diseases, John Hunter Hospital, Newcastle, New South Wales, Australia
    2. Menzies School of Health Research, Charles Darwin University, Casuarina, Northern Territory, Australia
    Contribution
    Investigation, Resources, Writing – review and editing
    Competing interests
    No competing interests declared
  8. Steven YC Tong

    1. Menzies School of Health Research, Charles Darwin University, Casuarina, Northern Territory, Australia
    2. Victorian Infectious Disease Service, Royal Melbourne Hospital, and University of Melbourne at the Peter Doherty Institute for Infection and Immunity, Melbourne, Australia
    Contribution
    Conceptualization, Investigation, Resources
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-1368-8356
  9. Bernadette C Young

    Nuffield Department of medicine, Oxford, United Kingdom
    Contribution
    Methodology, Writing – review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-6071-6770
  10. Daniel J Wilson

    Big Data Institute, Nuffield Department of Population Health, Li Ka Shing Centre for Health Information and Discovery, Old Road Campus, University of Oxford, Oxford, United Kingdom
    Contribution
    Methodology, Software, Writing – review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-0940-3311
  11. Timothy P Stinear

    Department of Microbiology and Immunology at the Peter Doherty Institute for Infection and Immunity, University of Melbourne, Melbourne, Australia
    Contribution
    Conceptualization, Methodology, Supervision, Writing – review and editing
    For correspondence
    tstinear@unimelb.edu.au
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-0150-123X
  12. Benjamin P Howden

    1. Department of Microbiology and Immunology at the Peter Doherty Institute for Infection and Immunity, University of Melbourne, Melbourne, Australia
    2. Department of Infectious Diseases, Austin Health, Heidelberg, Australia
    3. Microbiological Diagnostic Unit Public Health Laboratory, The University of Melbourne at the Doherty Institute for Infection and Immunity, Melbourne, Australia
    Contribution
    Conceptualization, Funding acquisition, Supervision, Writing – review and editing
    For correspondence
    bhowden@unimelb.edu.au
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-0237-1473

Funding

National Health and Medical Research Council (GNT1105525)

  • Timothy P Stinear

National Health and Medical Research Council (GNT1196103)

  • Benjamin P Howden

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

The authors declare that they have no competing interests. All data needed to evaluate the conclusions in the paper are present in the paper and/or the Supplementary Materials. This work was supported by a Research Fellowship to BPH and TPS from the National Health and Medical Research Council, Australia. SGG was supported by a PhD scholarship of the University of Melbourne. We acknowledge the CAMERA2 Study Group for sharing sequences and clinical metadata of trial participants with multiple sequential bacteraemia strains: Nick Anagostou, David Andresen, Sophia Archuleta, Narin Bak, Alan Cass, Mark Chatfield, Alan Cheng, Jane Davies, Joshua Davis, Yael Dishon, Ravindra Dotel, Patricia Ferguson, Hong Foo, Vance Fowler, Niladri Ghosh, Timothy Gray, Stephen Guy, Natasha Holmes, Benjamin Howden, Sandra Johnson, Shirin Kalimuddin, David Lye, Stephen McBride, Genevieve McKew, Niamh Meagher, Jane Nelson, Matthew O’Sullivan, David Paterson, Mical Paul, David Price, Anna Ralph, Matthew Roberts, Owen Robinson, Ben Rogers, Naomi Runnegar, Simon Smith, Archana Sud, Steven Tong, Adrian Tramontana, Sebastian Van Hal, Genevieve Walls, Morgyn Warner, Dafna Yahav, and Barnaby Young.

Ethics

Ethics approval was obtained at each partecipating site to the CAMERA2 trial and written informed onsent was obtained from each participant or surrogate decision maker.

Senior and Reviewing Editor

  1. Bavesh D Kana, University of the Witwatersrand, South Africa

Reviewers

  1. Daria Van Tyne, University of Pittsburgh, United States
  2. Meiqin Zheng

Publication history

  1. Received: January 19, 2022
  2. Preprint posted: February 12, 2022 (view preprint)
  3. Accepted: June 8, 2022
  4. Accepted Manuscript published: June 14, 2022 (version 1)
  5. Version of Record published: July 8, 2022 (version 2)

Copyright

© 2022, Giulieri et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 1,038
    Page views
  • 378
    Downloads
  • 1
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Stefano G Giulieri
  2. Romain Guérillot
  3. Sebastian Duchene
  4. Abderrahman Hachani
  5. Diane Daniel
  6. Torsten Seemann
  7. Joshua S Davis
  8. Steven YC Tong
  9. Bernadette C Young
  10. Daniel J Wilson
  11. Timothy P Stinear
  12. Benjamin P Howden
(2022)
Niche-specific genome degradation and convergent evolution shaping Staphylococcus aureus adaptation during severe infections
eLife 11:e77195.
https://doi.org/10.7554/eLife.77195
  1. Further reading

Further reading

    1. Genetics and Genomics
    2. Plant Biology
    Simon Snoeck, Bradley W Abramson ... Adam D Steinbrenner
    Research Article Updated

    As a first step in innate immunity, pattern recognition receptors (PRRs) recognize the distinct pathogen and herbivore-associated molecular patterns and mediate activation of immune responses, but specific steps in the evolution of new PRR sensing functions are not well understood. We employed comparative genomic and functional analyses to define evolutionary events leading to the sensing of the herbivore-associated peptide inceptin (In11) by the PRR inceptin receptor (INR) in legume plant species. Existing and de novo genome assemblies revealed that the presence of a functional INR gene corresponded with ability to respond to In11 across ~53 million years (my) of evolution. In11 recognition is unique to the clade of Phaseoloid legumes, and only a single clade of INR homologs from Phaseoloids was functional in a heterologous model. The syntenic loci of several non-Phaseoloid outgroup species nonetheless contain non-functional INR-like homologs, suggesting that an ancestral gene insertion event and diversification preceded the evolution of a specific INR receptor function ~28 my ago. Chimeric and ancestrally reconstructed receptors indicated that 16 amino acid differences in the C1 leucine-rich repeat domain and C2 intervening motif mediate gain of In11 recognition. Thus, high PRR diversity was likely followed by a small number of mutations to expand innate immune recognition to a novel peptide elicitor. Analysis of INR evolution provides a model for functional diversification of other germline-encoded PRRs.

    1. Chromosomes and Gene Expression
    2. Genetics and Genomics
    Evgeniya N Andreyeva, Alexander V Emelyanov ... Dmitry V Fyodorov
    Research Article

    Asynchronous replication of chromosome domains during S phase is essential for eukaryotic genome function, but the mechanisms establishing which domains replicate early versus late in different cell types remain incompletely understood. Intercalary heterochromatin domains replicate very late in both diploid chromosomes of dividing cells and in endoreplicating polytene chromosomes where they are also underrelicated. Drosophila SNF2-related factor SUUR imparts locus-specific underreplication of polytene chromosomes. SUUR negatively regulates DNA replication fork progression; however, its mechanism of action remains obscure. Here we developed a novel method termed MS-Enabled Rapid protein Complex Identification (MERCI) to isolate a stable stoichiometric native complex SUMM4 that comprises SUUR and a chromatin boundary protein Mod(Mdg4)-67.2. Mod(Mdg4) stimulates SUUR ATPase activity and is required for a normal spatiotemporal distribution of SUUR in vivo. SUUR and Mod(Mdg4)-67.2 together mediate the activities of gypsy insulator that prevent certain enhancer-promoter interactions and establish euchromatin-heterochromatin barriers in the genome. Furthermore, SuUR or mod(mdg4) mutations reverse underreplication of intercalary heterochromatin. Thus, SUMM4 can impart late replication of intercalary heterochromatin by attenuating the progression of replication forks through euchromatin/heterochromatin boundaries. Our findings implicate a SNF2 family ATP-dependent motor protein SUUR in the insulator function, reveal that DNA replication can be delayed by a chromatin barrier and uncover a critical role for architectural proteins in replication control. They suggest a mechanism for the establishment of late replication that does not depend on an asynchronous firing of late replication origins.