Introduction

Pseudomonas aeruginosa is a versatile, opportunistic human pathogen causing a wide range of infections of both community and hospital origin13. In the clinic, patients with cystic fibrosis (CF), severe burns or neutropenia are of highest risk and cases are associated with high morbidity and mortality3. Furthermore, the management of these infections is increasingly difficult due to the global emergence of multidrug resistant (MDR) strains, especially carbapenem-resistant P. aeruginosa (CRPA), which now account for 16-30% of cases in the US4. In response, global and national agencies have categorized CRPA as a “Critical” pathogen posing a serious threat on public health4.

While high-risk MDR lineages, exemplified by the most globally prevalent sequence type (ST) 235, are enriched in isolates carrying acquired carbapenemases and/or extended-spectrum beta lactamases (ESBLs)5,6, the hallmark of P. aeruginosa lies in its remarkable propensity for developing resistances through chromosomal mutations7. Within the hundreds of intrinsic genes comprising this mutational resistome, several harbor well-characterized modifications that are the most common cause of resistance for nearly all classes of drugs7. These include gain-of-function mutations in gyrase subunits GyrAB (fluoroquinolones), penicillin-binding protein PBP3 (cephalosporins), various Mex efflux systems (multiple antimicrobials) as well as inactivation of the porin OprD (carbapenems) or the sensor component of the PhoPQ system, the latter associated with resistance to last-line polymyxin antibiotics810.

Being capable of both acute and chronic infection, P. aeruginosa possesses a large arsenal of virulence factors such as a type 4 pili (T4P), O-antigens and lipopolysaccharides (LPS), extracellular enzymes, exotoxins (e.g. ExoS/U), a flagellum and five types of secretion systems (TSSs)11. These are generally part of complex regulatory pathways in which two-component systems (TCSs) often play a major role11,12. The ability of P. aeruginosa to form biofilm is a key aspect of its persistence in patients with chronic infections as well as in environmental niches13. In the hospital, biofilms are often found in water-related environments and eradication is particularly difficult due to increased disinfectant resistance1,14. This in turn poses a challenge for infection prevention and control (IPAC) as illustrated by the increasing reports of P. aeruginosa outbreaks and hospital-acquired infections (HAI) linked to contaminated sinks, drains or taps1518.

Here, routine genome-based surveillance of MDR clinical isolates across the US Military Health System (MHS) identified an epidemic cluster of P. aeruginosa in a single hospital in 2020. Retrospective analysis of isolates from the preceding ten years, together with Bayesian phylogenetics, revealed a protracted outbreak clone with cases from all hospital floors and an origin dated to the late 1990s. Dissection of this unique dataset sheds new light on the evolution, persistence, and routes of transmission of P. aeruginosa in the clinic.

Results

P. aeruginosa ST-621 is endemic in a large US tertiary hospital

The Multidrug-Resistant Organism Repository and Surveillance Network (MRSN) performed whole genome sequencing (WGS) of 5,129 P. aeruginosa isolates received between 2011 and 2020 from 71 facilities around the globe19. While a total of 547 STs were represented, high-risk epidemic lineages ST-235/exoU+ (10%) and ST-244/exoS+ (5%) were the most prevalent and widespread with isolates recovered from >28 facilities in each case (Fig. S1A). By contrast, isolates from lineage ST-621/exoS+ also represented 5% of the total STs, but originated from just 6 facilities, with 89% from a single US tertiary hospital (Facility A) and sharing high-level genetic relatedness by core genome MLST (cgMLST) (Fig. S1B). Furthermore, of the 5 additional facilities from which ST-621 was detected, patient transfer from facility A was observed in three instances and isolates from the remaining two were genetically distinct strains (Fig. S1B).

Retrospective analysis revealed that a total of 254 ST-621 isolates were collected from facility A between May 2011 and December 2020. The majority were isolated from respiratory (42%) and urine (39%) samples, followed by wound (7%), surveillance (6%), tissue (3%), and blood cultures (2%) (Table S1).

A protracted outbreak with cases dispersed throughout floors and wards

Phylogenetic analysis confirmed that all ST-621 isolates from facility A belonged to the same outbreak clone with an average distance (all vs all) of just 38 single nucleotide polymorphisms (SNPs) (Fig. 1A). A total of 82 patients were represented, with the majority (56%) contributing serial isolates. In particular, six patients (ID 10, 27, 34, 35, 49 and 63) showed prolonged carriage, best exemplified by patient 35 from whom 21 urine isolates were collected over 4.5 years (Fig. 1A and B).

Phylogenetic and spatio-temporal analysis of 253 clinical ST-621 P. aeruginosa isolates.

(A) SNP-based, core genome phylogeny. Outbreak subclones are colored with red (SC1) or blue (SC2) branches. From inner- to outermost, datasets show: the year, the floor, the building, and the source of isolation. Patients with prolonged carriage (>1 year) are indicated. (B) Timeline of isolate collection. Each line indicates a single patient. Isolates are shown as dots colored by the floor the patient was on. Patients with prolonged carriage are highlighted. (C) Mock floor plan of the Main (light blue) and Tower (dark blue) Buildings of Facility A. Wards and floors are numbered. For each floor, a pie chart indicates the proportion of isolates collected from urine, respiratory, or other cultures. Within each ward, isolates are colored by year of collection with a red outline indicating primary isolates. Symbols indicate the presumed origin of infection. (D) Bar charts showing for each year the number of: all P. aeruginosa primary clinical isolates collected at Facility A (bottom), the subset of genome-sequenced primary isolates (middle), and the subset of identified ST-621 isolates (top). Bars are color-coded by the isolation source. (E) Distribution of SNP distances for all inter-patient isolates or subsets collected in the same month, the same ward, or both. Dotted lines indicate the local maxima.

Two outbreak subclones, designated SC1 and SC2, were identified from the tree topology and showed increased genetic relatedness within each (average distance of 24 and 31 SNPs, respectively) (Fig. 1A). Overall ST-621 isolates were observed in 27 wards/units from all 7 floors of two buildings (Main and Tower Buildings) in facility A (Fig. 1C), yet subclones SC1 and SC2 were spatially and temporally distinct. 78% of SC1 isolates were collected from patients in the Main Building and 60% were detected before 2014. In contrast, 52% of SC2 isolates originated from patients in the Tower Building and 77% were collected after 2014 (Fig. 1A). Specifically, the early spread of SC1 was largely connected to patients in adjacent ICUs and step-down units on the 2nd and 3rd floors (wards 1-10) of the Main Building. The subsequent emergence of SC2 largely centered in patients receiving care in adjacent ICUs (wards 20-21) on the 3rd floor within the Tower Building (Fig. 1A and C). Throughout the outbreak, new cases due to either SC1 or SC2 were also detected from patients in the emergency room (ER). Unlike the isolates from other locations, these ER isolates were nearly all (32 vs 92%, p <.0001) from urine cultures (Fig. 1C).

The epidemic curve revealed that between 2011 and 2016 an average of 11.5 new cases were detected every year. From 2017 to 2021, only 12 new patients were identified (3 cases a year) (Fig. 1D). Although the rate of outbreak cases appeared to decrease, the overall number of P. aeruginosa cases in facility A (irrespective of lineages) remained stable throughout the studied period. However, due to staffing issues, the number of isolates received/sequenced by the MRSN every year was not consistent.

An origin dated soon after the opening of facility A

Though the collection of clinical isolates at facility A only started in 2011, genetic diversity was already evident (16.3 SNPs on average), suggesting an earlier origin (Table S2). Molecular clock phylogenetic analyses of all studied isolates inferred the time of the most recent common ancestor (MRCA) to be in 1999 (± 3 years), soon after the opening of the Main Building of facility A in 1996 (Fig. 2). Furthermore, this analysis predicted that, despite the observed temporal succession in sequenced isolates, SC2 was not a descendant of SC1. Instead, the two subclones emerged concurrently with respective MRCAs dated in 2004 (± 2 years) (Fig. 2).

Dated phylogeny of 253 ST-621 outbreak isolates.

Time-stamped phylogeny. Error bars for annotated nodes are shown as gray boxes. Years are indicated on the left, along with the opening of the Main Building (T1), and the Tower Building extension (T2). The MRCA of SC1 and SC2 are indicated in black text above the node. The MRCA of isolates from patients with prolonged carriage are shown in red, indicated by their patient ID. The predicted emergence of select variants are shown in blue. Data showing the floor and source isolates were collected from is shown, as in Figure 1. Select datasets are shown including the antibiotic susceptibility testing results for cefepime (cephalosporin) and imipenem (carbapenem).

Despite this predicted early origin, it is only after facility expansion and the opening of the Tower Building in 2012 that the first SC2 isolates were ever sampled (Table S1).

Specifically, the earliest SC2 isolate collected was from a wound infection in patient 27. Remarkably, the 15 serial isolates collected over 3 years (2012 to 2015) from this chronically infected patient were not monophyletic and their MRCA coincided with the emergence of this subclone (Fig. 2).

Environmental reservoirs sustain nosocomial transmission

The presumed origin of an infection, either directly from patient-to-patient or from environment-to-patient, was designated based on SNP thresholds (starting from pairwise comparisons for all inter-patient isolates, Table S2) and known epidemiological links, or lack thereof. Indeed, while a unimodal distribution of SNPs (peak at 40.5) was observed for the whole cohort, a bimodal profile appeared when inter-patient isolates from the same month, the same ward, or both were selected (local maxima of the lower peak at 12.4, 15.3, and 11.6 SNPs respectively) (Fig. 1E). This pool of virtually identical isolates from patients with spatial and temporal overlap represented the most likely cases of direct, patient-to-patient transmission. Pairs of isolates from only 10 patients satisfied these conditions (Table S1): All carried subclone SC1, occurred in 2011 and 2012 (with two exceptions), and six were hospitalized in ICU #1 in the Main Building of facility A (Fig. 1C).

In contrast, a presumed environmental origin was designated when a new patient was infected with a virtually identical strain from a preceding patient in the same ward, but no known temporal overlap was established for at least 1 year. Twelve cases met these criteria (Fig. 1C and Table S2). Among them, 3 patients (ID 57, 61, and 74) in ICU #20 on the Tower Building had primary isolates of subclone SC2 that were distinct by only 0-4 SNPs despite being collected years apart (in 2015, 2016, and 2018, respectively) (Fig. 1C). Furthermore, an origin from contaminated surfaces was also predicted for 7 of the 9 most recent patients (2018-2020). This included 5 cases caused by the apparent resurgence of SC1 in ICUs #1-2 of the Main Building (Fig. 1C).

Genomic inferences were supported by contemporary environmental sampling, conducted in 2021 and 2022, which confirmed the existence of reservoirs (i.e. sink drains located in patient rooms) of both SC1 and SC2 isolates (n = 17) in 7 distinct patient rooms in wards #1, 2, 6 and 20 throughout the Main and Tower Buildings of facility A (Fig. 3, Table S3). In that timeframe, an additional 56 clinical isolates (25 SC1 and 31 SC2) from 13 patients (including 11 new cases) were collected, indicating transmission events were still occurring (Fig. 3).

Identification of environmental reservoirs of the ST-621 outbreak clone in sink drains.

(A) Photos depict a swab from a sink drain in a patient room and the typical sink design (shallow basins and gooseneck faucets) in facility A. (B) Minimum Spanning Tree of 326 Facility A ST-621 P. aeruginosa isolates, including contemporary isolates collected in 2021 and 2022 from clinical (n= 56) and environmental (n= 17) sources. Isolates from wards #1, 6 and 20 where environmental contamination was identified are colored. Environmental isolates are indicated with the letter E.

Selection on resistance genes and rising prevalence of CRPA due to porin defects

Phenotypically, a prototypical outbreak isolate showed non-susceptibility to aminoglycosides (except for amikacin), 3rd and 4th generation cephalosporins, carbapenems, and fluoroquinolones (Table S4). However, resistance profiles were not homogenous and temporal patterns were observed (Fig. 4A). In particular, while 63% of the isolates collected in 2011 were non-susceptible to imipenem, this fraction gradually increased to >95% by 2014 and beyond. In the same timeframe, the prevalence of isolates with non-susceptibility to cefepime decreased from 81% to 37% (Fig. 4A). Both phenomena were linked to the decline of SC1 and rise of SC2, with the latter displaying significantly higher (98% to 76% in SC1, p<0.05) and lower (38% to 66%, p<0.05) rates of resistance to imipenem and cefepime, respectively (Fig. 2). In contrast, resistance to tobramycin and ciprofloxacin was highly prevalent and remained relatively stable over the sampling period (Fig. 4A).

Antibiotic resistance and poly-variant sites of 253 outbreak isolates.

(A) Bar charts showing the proportion of clinical isolates resistant to tobramycin, cefepime, imipenem, and ciprofloxacin each year from 2011-2020. (B) Chart of mutations (excluding synonymous) in resistance-associated genes. Each block indicates one distinct mutation observed within the corresponding gene. Blocks are colored by the number of isolates sharing this exact mutation. Blocks are outlined to indicate whether the mutation is found within SC1, SC2, or isolates from both subclones. Mutations causing a predicted loss-of-function (LOF) are indicated with a star. (C) Pruned SNP-based phylogeny showing the acquisition of colistin resistance. Patient ID, day of collection (since the first outbreak isolate), and available colistin MIC are shown on the right. The MRCA of the PhoQ E77fs mutation is indicated with a triangle.

Genotypically, outbreak isolates contained no plasmids (based on the finished genomes of a diverse subset of isolates, Table S1) and, besides an intrinsic AmpC-type β-lactamase, only carried two acquired resistance genes, known to confer resistance to ciprofloxacin (crpP) and tobramycin (aac(6’)-Ib4) (Table S1). To explain other phenotypic resistances, and in the absence of an acquired ESBL or carbapenemase, a genome-wide analysis of non-synonymous (NSY) and loss of function (LOF) mutations was performed for all outbreak isolates (Table S5). From a list of 164 genes (Table S6) for which known chromosomal mutations have been linked to antibiotic resistance11-13, 68 distinct mutations (including 20 predicted LOF) in 37/164 genes were identified in at least one outbreak isolate (Table S7). Of particular interest, 8 genes (or functional clusters) were each mutated independently three or more times (Fig. 4B).

The most frequently mutated gene was oprD, encoding a porin that facilitates the diffusion of amino acids, small peptides, and carbapenem antibiotics into the cell8. Five of the six independent mutations identified in OprD were predicted LOF, including an R6* stop codon and a R318fs frameshift observed in 115 SC2 and 92 SC1 isolates, respectively (Fig. 4B, Table S7). Phylogenetic Analysis Using Parsimony (PAUP) predicted that both the R318fs and R6* mutations occurred early (between 2006-2008) within the evolution of their respective subclones (Fig. 2). Despite this early origin, highly carbapenem resistant SC2-OprD-R6* isolates were first sampled from patients in mid-2012 and only became predominant from 2014 onward (Fig. 2).

Within all other mutations in antibiotic resistance associated genes (Table S7), only two more variants were found in > 50 outbreak isolates: a W97* stop codon in β-lactamase expression regulator AmpD and a R504C substitution in penicillin-binding protein PBP3 (Fig. 4B). The LOF in AmpD, predicted to have emerged in 2012, was observed in a subset of 51 monophyletic SC2-OprD-R6* background isolates that were not consistently resistant to cefepime (Fig. 2). In contrast, the R504C substitution in PBP3 was acquired independently twice (within SC1-OprD-R318fs in 2007 and within SC2-OprD-R6* in 2014) and was strongly associated with a resistance to 4th-generation cephalosporins (Fig. 2). Overall, ampD and pbp3 were among the genes with the most independently acquired NSY or LOF mutations, with six and four distinct sites, respectively (Fig. 4B). Accumulation of mutations, albeit in small numbers of isolates, was also observed in other genes involved in β-lactam resistance (e.g. catabolite repression control crc, and mpl UDP-muramic acid/peptide ligase) as well as antibiotic efflux systems (mexXY and mexCD-oprJ) (Fig. 4B).

Sporadic emergence of colistin resistance

Though rare, the presence and impact of mutations in genes associated with resistance to colistin, a last-line antibiotic, was examined. NSY mutations were detected in the TCSs ColRS (three distinct sites in four isolates) (Fig. 4B) and PmrAB (two distinct sites in three isolates) (Table S7) but colistin MIC’s were the same as wild-type isolates (4 mg/L). By contrast, patient 2 (who had a history of colistin therapy) acquired a predicted LOF mutation (E77fs) in sensor protein PhoQ in two serial isolates resulting in an 8-fold increase in MIC (32 mg/L) (Fig. 4C). Notably, the emergence of this colistin resistant strain, which occurred in early 2011 (Fig. 2), remained constrained to a single patient and no further spread was detected.

Mutational convergence in virulence, cell wall biogenesis and signaling pathways

Genome-wide, 64 genes were each mutated independently two or more times over the course of this outbreak (Table S5). Compared to the distribution observed for the whole genome, these 64 genes were functionally enriched for signal transduction (28% vs 6% in the whole genome, p<0.05) and cell wall/membrane biogenesis (17% vs 7%, p<0.05) (Fig. 5A). For the latter category, besides genes already discussed for their direct role in antibiotic resistance (i.e pbp3, oprD, mpl, and mex loci), six other genes were recurrently mutated in outbreak isolates including LPS biosynthesis rmlA and choline transporter betT2 (Fig. 5A).

Identification of pathoadaptive mutations.

(A) Distribution of COG functional categories for all genes (Ref) compared to the subset of genes (Sel) with 2 or more distinct mutated sites (excluding synonymous) in outbreak isolates. Significant enrichment in cell wall biogenesis and signal transduction categories are indicated. A positional chart of mutations across the entire chromosome of closed reference genome for isolate 4605 is provided. The number of unique mutation sites within each gene is indicated (top) as well as the number of outbreak isolates which carried each mutation (bottom). Genes with 3 or more distinct mutated sites are labeled and genes involved in cell wall biogenesis and signal transduction genes are color coded. (B) Accumulation of mutations (excluding synonymous) in genes and pathways associated with P. aeruginosa pathogenesis. Each block indicates one distinct mutation observed within a gene part of the indicated functional group (data available in Table S8). Blocks are colored by the number of isolates sharing this exact mutation. Blocks are outlined to indicate whether the mutation is found within SC1, SC2, or isolates from both subclones. Mutations causing a predicted loss-of-function (LOF) are indicated with a star.

For genes associated with signaling functions, various TCS involved in pathogenesis were represented, including narXL, fleRS, and the catabolic regulation cbrAB (Fig. 5A). Notably, cbrAB was the most independently mutated loci in this dataset with 16 distinct NSY mutations in 80 isolates from both subclones. Beyond TCS, sigma factor pvdS, required for toxin expression (6 NSY and 1 LOF), motility regulator morA (11 NSY and 2 LOF) and mucoidy regulatory system mucAB (4 sites including three predicted LOF) were also recurrently mutated (though rarely fixed) in outbreak isolates (Fig. 5A).

Finally, at the level of functional pathways, and compared to the theoretical frequency for an even distribution of mutated sites across the whole genome (0.11 sites per kb of coding sequence), SYN and LOF mutations were significantly enriched in genes involved in the biosynthesis of the Type IV secretion system (0.59 sites/kb, p<0.05), flagellar biosynthesis and chemotaxis (0.21 sites/kb, p<0.05), alginate production (0.31 sites/kb, p<0.05) and O-antigen production (0.36 sites/kb, p<0.05) (Fig. 5B, Table S8). Importantly, the WbpX D-rhamnosyltransferase, part of the O-antigen biosynthesis pathway, was disrupted (S19fs) in 127 SC2 isolates (Table S5). PAUP predicted this frameshift occurred in 2005 (± 24 months), concomitant with the predicted emergence of SC2 (Fig. 2).

Discussion

WGS technology is revolutionizing IPAC. Here, routine surveillance of MDR pathogens across the MHS was critical in uncovering a decades-long P. aeruginosa epidemic cluster. With traditional approaches alone20, comparable outbreaks may avoid detection due to the sporadic nature of patient infections, scattering of cases throughout the hospital, and changing antibiotic susceptibility profiles. Furthermore, because of budget constraints, surveillance programs focusing on “high-risk” isolates carrying select resistance genes (e.g. ESBLs, carbapenemases)46, would also fail to detect this ST-621 outbreak clone. Indeed, while the ST-621 lineage was the focus of several studies in the late 2000’s, these examined a blaIMP-13 carrying epidemic clone which spread throughout Europe and has since been sporadically detected in South America and Southeast Asia2124. To our knowledge this IMP-carrying strain has not been reported in the USA to date, and the outbreak in Facility A is the result of a distinct MDR clone lacking the carbapenemase.

Similar to the MHS, healthcare networks worldwide that already benefit from prospective WGS surveillance programs are reporting large numbers of unrecognized outbreaks18. However, two resources rarely available in other settings were key to this investigation: a genome repository of isolates from the preceding decade and the ability to conduct prospective environmental sampling. The former was integral to date the presumed origin of the outbreak to the late 90s, soon after the hospital opened, and to trace the evolution and separate spread of two subclones throughout floors and wards. The latter revealed that, more than 20 years later, reservoirs of both subclones persist in sink drains from patient rooms throughout the facility. This supports previous evidence implicating these sites as major reservoirs for P. aeruginosa in the clinic1,25. Decontamination of drains can be extremely challenging due to the limited penetration of disinfectants, the lack of access (e.g. to perform scrubbing), recolonization (e.g. disposal of contaminated patient specimens) or retrograde growth from p-traps1,14,25. Furthermore, sinks with shallow basins and gooseneck faucets directing water straight into the drain, similar to those in Facility A (Fig. 3A), have been linked to increased back splash onto nearby surfaces and medical equipment26.

Besides environmental contamination, reservoirs within patients likely contributed to the spread and longevity of the outbreak. In particular, chronic infections, documented for a significant fraction of patients, provided a recurring source of the epidemic clone. It is noteworthy that 5 of the 6 patients with chronic (> 1 year) infections (and 9 out of 11 if reduced to 6 months) were carriers of subclone SC2. Compared to SC1, all but one SC2 isolates carried a truncated WbpX glycosyltransferase, an essential component of the common polysaccharide antigen (CPA) biosynthesis27. While this mutation could result in increased immune evasion, the rough LPS phenotype recurrently observed in vivo (in particular in CF patients) usually results from a functional CPA but the lack of O-specific antigen11,28. In further evidence for a progression toward a host-adapted lifestyle, a convergence of mutations in genes involved in alginate production, quorum-sensing deficiency, loss of motility, and decreased protease secretion, all phenotypes associated with chronic infections29, was observed throughout the 10 years of sampling.

Considering its role in chronic infections and the distinctive association with patients and sinks in the Tower Building of Facility A, the origin and emergence of subclone SC2 is of particular importance. Supported by BEAST2 inferences, a plausible scenario would be that a patient with an initial infection acquired during a stay in the Main Building before the SC1-2 split (ca. 2005) proceeded to shed the epidemic strain in the newly opened (2012) Tower Building during successive visits. Importantly, the Main and Tower Buildings of facility A have distinct plumbing systems, providing the necessary ecological separation (minus inter-hospital patient transfers) for the further spread and divergence of the two sub-clones. Patient 27 is just such an example, but the lack of early sampling (2000-2010) and incomplete collection thereafter precludes a definitive answer. Indeed, other scenarios cannot be discounted, with a recent study showing sinks in a newly built ICU were already contaminated with an outbreak clone of P. aeruginosa before the arrival of the first patients17.

One of the most notable features of this outbreak was the evolution of antibiotic resistance over time, with the emergence of resistance to 1st (cephalosporin), 2nd (carbapenems) and 3rd (colistin) line treatments. The potent R504C substitution in PBP3 was selected in each subclone, and in addition to facilitating cephalosporin resistance, has also been linked to ceftazidime/avibactam resistance30. While no patient prescription data is available from Facility A, it can be speculated that the high prevalence of cephalosporin resistance in early outbreak isolates led to increased carbapenem use, which in turn selected for the many independently evolved OprD mutants, one of the most common mechanisms for carbapenem resistance in P. aeruginosa11. Finally, albeit a sporadic event, the emergence of colistin resistance, through a well characterized mechanism, is a reminder that the threat of extensively drug resistant P. aeruginosa is only a prescription and a few mutations away31.

The data generated during this study has resulted in various ongoing interventions (e.g. closing sinks, replacing tubing, using foaming detergents32) at Facility A. Sampling and sequencing, in near real-time, of all P. aeruginosa clinical isolates (and not just MDR) also commenced in 2021. Notably, as of this writing, no new ST-621 patient cases have been reported in over 9 months, and the unbiased contemporary data using all P. aeruginosa from patients at this facility confirms the spread has slowed, with just 3 cases identified in 2022. Similar to Facility A, the roll-out of routine WGS surveillance in the clinic has the potential to improve patient care and prevent some of the estimated 136 million hospital-associated drug-resistant infections per year globally33.

Materials and methods

Data Collection & Antibiotic Susceptibility Testing

As mandated by the US National Action Plan for Combating Antibiotic-Resistant Bacteria (CARB), the Multidrug Resistant Organism Repository and Surveillance Network (MRSN) collects and analyses clinically relevant MDR organisms across the U.S. Military Health System (MHS) and around the world in collaboration with the Global Emerging Infections Surveillance (GEIS) Branch. From 2011 to 2020, 5,129 MDR P. aeruginosa isolates were collected from 71 facilities and sent to the MRSN. Of these, 1,511 were collected from Facility A, a large tertiary U.S. military hospital located in the continental United States. In total, 253 isolates belonged to the ST-621 outbreak clone.

Antibiotic susceptibility testing (AST) was performed in a College of American Pathologists (CAP)-accredited lab at Facility A using a Vitek 2 (card GN AST 71 and GN ID; bioMérieux, NC, USA). For isolates without MICs, AST was performed in the MRSN CAP-accredited clinical lab as previously described34.

Whole Genome Sequencing & Bioinformatic Analysis

Genomic DNA was extracted and sequenced via Illumina MiSeq or NextSeq benchtop sequencer (Illumina, Inc., San Diego, CA) as previously described34. Long read sequencing was performed using PacBio RS II (Pacific Biosciences of California, Inc., Menlo Park, CA). Kraken 2 was used to identify isolate species and check for contamination35. Short-read sequencing data were trimmed for adapter sequence content and quality using bbduk36. De novo assembly was performed using Newbler v2.937. Minimum thresholds for contig size and coverage were set at 200 bp and 49.5×, respectively. Long-read sequencing data were assembled using HGAP 3.0 in the SMRT Analysis portal. In silico multilocus sequence typing (MLST) was performed using the scheme developed by Curran and Dawson38. Antimicrobial resistance genes were annotated using a combination of AMRFinderPlus and ARIBA39,40. The genomes of all 253 ST-621 isolates have been deposited in the National Center for Biotechnology Information under BioProject PRJNA852179.

Core Genome MLST, Gene Annotation, SNP Calling, and Phylogenetic Analysis

Core genome MLST were determined in SeqSphere + (Ridom, Germany) using the cgMLST scheme developed by Curran et al38 and a 90% cut-off (3,960 of 4,400 genes). SNP calling was performed with Snippy v.4.4.5 (https://github.com/tseemann/snippy) using error corrected [Pilon v1.23] and annotated [Prokka v1.14.6] draft assembly, with isolate MRSN 4605 as the reference41,42. Type strain PAO1 was used as the basis for annotation43. PanSeq39 was run with a fragmentation size of 500[bp to find sequences with ≥95% identity in ≥90% of the isolates to generate the core genome SNP alignment and accessory genome binary alignment for the selected diversity set of 310 isolates. The SNP-based phylogeny was built using RAxML (version 8.2.11)40 from a 534[kb variable position alignment using the GTR GAMMA model and the rapid bootstrapping option for nucleotide sequences (100 replicates)The core SNP alignment was filtered for recombination using Gubbins v2.3.144. A SNP-based phylogeny was created by inferring a maximum-likelihood tree with RaxML-NG v0.9.0 using GTR+G (50 parsimony, 50 random) and was rooted at MRSN 699445. The phylogeny was visualized in iTOL46.

Gene annotation and clustering showed 8,754 total genes in the pangenome of the outbreak. The core genome (present in ≥ 99% of isolates) was comprised of 5,674 genes, with a shell genome of 297 genes found in ≥ 95% of isolates, and a cloud genome of 530 genes found in ≥ 15% of isolates.

Curation and analysis of recurring mutations in outbreak isolates

Genome-wide analysis of non-synonymous and loss-of-function mutations in all outbreak isolates showed a total of 629 mutated sites (257 singletons and 372 sites shared by ≥ 2 isolates) in 488 distinct genes (Table S5). For variant sites identified in >99% of outbreak isolates, individual nucleotide-based sequence alignments were performed using a wild-type allele from the PAO1 genome (https://www.pseudomonas.com/). This allowed the identification of sites for which the MRSN 4605 reference genome was carrying the mutation and not the remaining outbreak isolates. These were curated (e.g. entry was modified to reflect the true mutation and its presence/absence in all outbreak isolates) manually and indicated as such in Table S5.

SNP distances between isolates from different patients

SNP distances were obtained from the alignment by using snp-dist (https://github.com/treemann/snp-dists). To create violin plots, the complete all-vs-all dataset was limited, based on 4 sets of criteria, to pairs of isolates that (1) originated from different patients; (2) originated from different patients, and were collected in the same month; (3) originated from different patients and were collected in the same ward; and (4) originated from different patients, were collected in the same month, and the same ward. Primary isolates were designated as Patient-to-Patient if they were part of a pair from different patients, were collected in the same ward, were separated by ≤ 10 SNPs, and were ≤ 31 days apart. Primary isolates were designated as environmental if they were part of a pair from different patients, were collected in the same ward, were collected > 365 days apart, and were separated by a number of SNPs ≤ 3x the number of years they were separated (rounding down). All other primary isolates were designated as Undetermined.

Bayesian evolutionary phylogenetic analysis

To evaluate the strength of the temporal signal, TempEst v1.5.3 was utilized to visualize the relationship between root-to-tip genetic distances for samples with known collection dates47. To date internal nodes of interest on the phylogeny, bayesian phylogenetic inference was performed using BEAST2 v2.6.5 on a recombination free alignment, removing samples with uncertain collection dates, and accounting for constant sites with beast2_constsites (https://github.com/andersgs/beast2_constsites)48. The HKY substitution model was selected based on evaluation of all possible substitution models in bModelTest v1.2.149. The random clock model was selected based on support by the marginal likelihood value using the Nested Sampling package v1.1.050. BEAST2 was run under a coalescent constant population model with a Markov chain Monte Carlo length of 1 x 108 sampling every 5 x 103 steps. Analyses were repeated five times to confirm consistency between the obtained posterior distributions. Parameter estimates were computed using Tracer v1.7.1. Posterior trees were combined with LogCombiner and summarized in TreeAnnotator after a 50% burn-in. The final MCC target tree was visualized in FigTree v1.4.4 (https://github.com/rambaut/figtree) and annotated using iTOL (https://itol.embl.de/).

List of Supplementary Materials

Table S1 - Isolate Metadata

Table S2 - All vs All SNP Comparison

Table S3 - Environmental Isolates

Table S4 - Antibiotic Susceptibility Testing

Table S5 - Prediction of variants in outbreak isolates

Table S6 - Resistome reference intrinsic alleles

Table S7 - Mutations in intrinsic genes associated with antibiotic resistance

Table S8 - Mutations in functional groups or pathways

Acknowledgements

The authors are thankful to all the staff of the MRSN and the clinical microbiology laboratory of the MRSN MTF network. Additionally, we would like to thank Dr. Madeline Galac for sharing insight on phylogenetic analysis using parsimony and Dr. Xavier Didelot for helpful discussions in predicting patient transmission events during bacterial outbreaks. The manuscript has been reviewed by the Walter Reed Army Institute of Research and there is no objection to its presentation. The views expressed herein are those of the author(s) and do not necessarily reflect the official policy or position of the Defense Health Agency, Brooke Army Medical Center, the Department of Defense, nor any agencies under the U.S. Government.

Funding

This study was funded by the U.S. Army Medical command and the Defense Medical Research and Development Program. In addition, this study was partly funded by the Armed Forces Health Surveillance Division (AFHSD), Global Emerging Infections Surveillance (GEIS) Branch ProMIS P0079_22_WR. The funders have no role in the decision to publish or the preparation of this article.

Author Contributions

P.T.M and F.L. designed research; W.S., L.R.H, A.P., C.H., M.J.M, B.W.C., E.S., A.O., R.M., J.S., K.B., B.T.J, L.P., K.L., B.T., L.M.Y., Y.I.K., P.T.M. and F.L. performed research. W.S., L.R.H., C.H., E.S., J.W.B., P.T.M. and F.L. analyzed research; B.T., A.B., A.E.M, J.L.K, R.J.C., J.W.B. contributed clinical metadata and isolates. W.S., M.J.M, P.T.M and F.L. wrote the paper with input from all authors.

Data and materials availability

Both genomic assemblies and raw sequencing data of all isolates analyzed in this study are publicly available in the NCBI database under the BioProject number PRJNA8852179

Competing interests

The authors declare that they have no competing interests.

Minimum Spanning Trees of Pseudomonas aeruginosa cgMLST.

Isolates are color coded by facility, showing the 10 most prominent. (A) Includes all 5,129 Pseudomonas aeruginosa in the MRSN’s collection from 2011-2020. Clusters of isolates belonging to ST-235, -244, and -621 are indicated. (B) Includes all ST-621 isolates from 2011-2020. Genetic distances higher than 23 allelic differences are indicated.