Figures and data

A) Presence/absence of pT181 in high quality S. aureus shotgun genome data over time (n = 16,928). pT181+ isolates comprise less of the sample set over time, though counts of collected samples increase. B) Strain composition of pT181-containing samples has diversified since pT181’s initial detection in S99.5_2.

pT181 is characterized by hotspots of sequence variation.
Hypervariable regions have a large number of alleles present. Each hypervariable region is characterized by a set of mutation types (binwidth = 10 bp).

pT181 CDS SNP accumulation over time relative to the 1954 isolate, calculated from Snippy output.

A) Overall PCN distribution for gold-ranked, pT181+ isolates. The majority of pT181 isolates in our dataset are single-copy. Given that pT181 lacks segregation mechanisms, this suggests that the plasmid has become integrated into the chromosome. B) Subset of PCN distribution, enlarged for 0-3 copies/cell.

There were 6 non-type III SCCmec configurations of accessory genes directly surrounding the integration site of pT181 contigs across multiple strains in the SRA isolates.
This shows that the plasmid has integrated repeatedly into the chromosome, given that S. aureus is highly syntenic and accessory genes tend to cluster in specific regions of the genome 3,48,49.

Proportion of pT181 isolates present at single-copy number has increased since initial detection and varies across strains.
A) Proportion of single-copy pT181 increased over time. B) Single vs. multicopy pT181 across strains, for strains containing >= 2 pT181+ isolates in (n = 2,460) set. Strains are generally dominated by either single or multicopy pT181.

PCN of multicopy isolates has declined over time.
Strains’ PCN are not significantly different from reference strain, S99.5_8 (GLMM, treatment vs. control posthoc, BH multitest correction).

Multicopy PCN varies within identical plasmid sequences and strains, indicating that there are environmental or genetic effects occurring at the substrain level which impact pT181 PCN.
Plot shows all unique plasmid sequences (across all sites) occurring at least twice as multicopy isolates in the (n = 2,460) dataset.

Distribution of sum of blastn hits to screen for pT181 presence.
Primary peak was for sum of blastn hits distribution was 4425-4475 bp and the threshold for inclusion of isolates was set at 4400 bp.

Cartoon of read mapping and relative read depth measurements.
The relative read depth of the plasmid provides an estimate of its copy number.

Distribution of chromosomal copy number (CCN), colored by position relative to origin of replication.
Contigs are considered near the origin if they are on the same contig as at least one of the 4 genes (rpmH, dnaA, dnaN, and recF) known to be close to the origin of replication 91.

Coefficients of variation of read depth (standard deviation divided by mean) for chromosomal contigs, pT181, and other contigs follow a similar distribution across 10 test gold-quality isolates.

Strains vary in median copy number, with an effect of BioProject.
Each BioProject has a different boxplot. Data shown for strain x BioProject combinations containing at least 5 isolates.

Unrooted phylogenetic trees for pT181 A) whole plasmid sequence and B) CDS.

Diagram of sample filtering steps

pT181 integrates into a region of the chromosome consisting primarily of strain diffuse and rare genes3.
No genes present in positions 55-90kbp in complete genomes with chromosomally integrated pT181 are universal to this region in all isolates.

Pre-rotation start position of plasmid contig does not influence position of mutations detected in pT181.
X-axis, hitStart, indicates start position of isolates genome relative to blast query (i.e., subject start / “sstart” for retrieval with blastn). Y-axis, mutation position, indicates where in the isolate mutations occurred, following their rotation to have equal start points. The presence of the mutational spikes across a range of hitStart positions indicates that it is not the rotation positions that are generating these spikes; they are biological in origin and not due to technical error.