E. coli O157:H7 isolates selected for the study and included in analysis.

Four sets of isolates, all originating from cattle or humans, were included in the study.

Maximum likelihood core SNP tree of the 854 E. coli O157:H7 isolates referenced in the study.

This includes 229 randomly sampled cattle and human isolates from Alberta, Canada, from 2007 through 2015; 432 additional Alberta isolates sequenced as part of public health activities from 2009 through 2019; and 152 isolates from the U.S. and 47 isolates from elsewhere around the globe from 1996 to 2016 to examine international transmission history. Clade is shown in the coloration of the tips on the tree, geographic origin is shown on the inner ring, species of origin on the middle ring, and Shiga toxin gene (stx) profile on the outer ring. The tree was rooted at clade A. Clade G constituted the majority of isolates.

Distribution of study isolates by geographic source, clade, and Shiga toxin gene (stx) profile

Relationship of randomly selected E. coli O157:H7 strains isolated from 121 reported human cases and 108 beef cattle in Alberta, Canada, 2007-2015.

Target diagrams show SNP distances from cattle (A, top) and humans (A, bottom) to cattle (blue) and humans (orange), with rings labeled with the SNP distance between isolates. Cattle isolates were highly related with 53% of cattle isolates within 5 SNPs of another cattle isolate and 83% within 15 SNPs (A, top). Human isolates showed a bimodal distribution in their relationship to cattle isolates, with 87% within 52 SNPs of a cattle isolate and the remainder 185-396 SNPs apart (A, bottom). The maximum clade credibility tree for the structured coalescent analysis of cattle and human isolates (B) was colored by inferred host, cattle (blue) or human (orange). The majority of ancestral nodes inferred as cattle suggests cattle as the primary reservoir. The root was estimated at 1812 (95% HPD 1748, 1870). Eleven local persistent lineages (LPLs) were identified, all in the G(vi) clade and labeled G(vi)-AB LPL 1 through 11 (yellow and gray coloration highlights LPLs but has no other meaning). These accounted for 44 human (36.4%) and 71 cattle (65.7%) isolates. The structured coalescent model estimated 107 cattle-to-human state transitions between branches, compared to only 31 human-to-human transitions, inferring cattle as the origin of 77.5% of human lineages (C).

Extension of Alberta, Canada E. coli O157:H7 analysis to include 229 randomly selected study isolates and 432 additional public health isolates available from 2009 to 2019.

Six local persistent lineages (LPLs) in clade G continued to be associated with disease after the initial study period, as indicated by branches colored in orange (A). LPLs are shaded and labeled as in Figure 3. Outbreaks reported during the period were down-sampled to avoid biasing the phylogeny, and the number of cases represented associated with each outbreak are shown in red (LPL-associated outbreaks) or purple (non-LPL-associated outbreaks) text. In 2018 and 2019, 74.7% of reported cases were associated with an LPL. The stx profile across all clades shifted from the initial study period (2007-2015) to the later study period (2016-2019), with more of the virulent stx2a-only profile observed in 2018 and 2019 than in previous years (B). The peak in sequences in 2014 is due to two outbreaks; routine sequencing began in 2018 and 2019, accounting for the rise in sequenced cases during those years.

Genomic clustering of 690 Alberta E. coli O157 isolates.

Clustering performed from raw reads using PopPUNK v2.5.0 with 10,146 E. coli reference genomes.22 From the 246 isolates selected for sequencing for the study and 445 additional Alberta Health isolates included for contextualization, one isolate was removed prior to clustering analysis, because it was identified through metadata review as an environmental (non-human, non-cattle) isolate. Cluster 1 included the Sakai and EDL933 reference strains. Clusters 826 and 827 were novel clusters. Isolates outside of Cluster 1 were excluded from all subsequent analyses.

Comparison of key parameters drawn from the posterior distributions of four runs using different starting seeds to draws from the prior distribution.

The prior distributions for the clock rate (A, purple-grey), tree height (B, red), mascot (C, yellow), backward migration rate (D, not visible), human effective population size (E, grey), and cattle effective population size (F, red) all differed substantially from the posterior distributions of the four runs.

Comparison of Alberta, U.S., and global E. coli O157:H7 isolates.

Tips are colored based on isolate origin. Internal nodes are sized based on Bayesian posterior probability. LPLs are shaded and labeled as in Figure 3. Alberta isolates 2007-2016 and U.S. isolates 1996-2016 were analyzed for clade G only (A). Alberta, U.S., and other global isolates from 2007-2015 were analyzed for clades C through G (B). Two U.S. isolates, from 2014 and 2015, arose from Alberta LPLs 9 and 11, respectively. No global isolates were associated with Alberta LPLs.

Analyses conducted and model priors.

Estimated migrations from structured coalescent analysis of Alberta, U.S., and global isolates, excluding clades A and B.