Persistent cross-species transmission systems dominate Shiga toxin-producing Escherichia coli O157:H7 epidemiology in a high incidence region: A genomic epidemiology study

  1. Gillian AM Tarr  Is a corresponding author
  2. Linda Chui
  3. Kim Stanford
  4. Emmanuel W Bumunang
  5. Rahat Zaheer
  6. Vincent Li
  7. Stephen B Freedman
  8. Chad R Laing
  9. Tim A McAllister
  1. Division of Environmental Health Sciences, School of Public Health, University of Minnesota, United States
  2. Alberta Precision Laboratories, Alberta Public Health, Walter Mackenzie Health Sciences Centre, Canada
  3. Department of Laboratory Medicine and Pathology, University of Alberta, Canada
  4. Department of Biological Sciences, University of Lethbridge, Canada
  5. Agriculture and Agri-Food Canada, Lethbridge Research and Development Centre, Canada
  6. Sections of Pediatric Emergency Medicine and Gastroenterology, Department of Pediatrics, Alberta Children’s Hospital and Alberta Children’s Hospital Research Institute, Cumming School of Medicine, University of Calgary, Canada
  7. National Center for Animal Diseases Lethbridge Laboratory, Canadian Food Inspection Agency, Canada
7 figures, 3 tables and 2 additional files

Figures

Maximum likelihood core SNP tree of the 1215 E. coli O157:H7 isolates referenced in the study.

This includes 659 isolates from Alberta, Canada, from 2007 through 2019, 494 isolates from the U.S. from 1996 through 2019, and 62 isolates from elsewhere around the globe from 2007–2016. The tree was rooted at clade A. (A) Shows all clades, with tips colored according to clade, geographic origin shown on the inner ring, and species of origin on the outer ring. (B) shows clade G(vi), which constituted 73.6% of all isolates and 88.3% of Alberta isolates. Tips are colored by geographic origin, and the ring indicates species of origin.

SNP distance between E. coli O157:H7 strains and their nearest relative, by species, Alberta, Canada, 2007–2015.

Distances for isolates from 121 reported human cases and 108 beef cattle are shown. Cattle isolates were highly related with 56.5% of cattle isolates within five SNPs of another cattle isolate and 94.4% within 25 SNPs. Human isolates showed a bimodal distribution in their relationship to cattle isolates, with 86.0% within 50 SNPs of a cattle isolate and the remainder 185–396 SNPs apart. Nineteen human isolates (15.7%) were within five SNPs of a cattle isolate.

Figure 3 with 4 supplements
Maximum clade credibility (MCC) tree of structured coalescent analysis of E. coli O157:H7 strains isolated from 115 reported human cases and 84 beef cattle in Alberta, Canada, 2007–2015.

Isolates were down-sampled prior to phylodynamic analysis to remove isolates that were highly similar. The structured coalescent analysis estimated migration and state transitions between humans and cattle. The MCC tree was colored by inferred host, cattle (blue) or human (orange). The minimum SNP distance between all isolates in the LPL, and the number of human and cattle isolates after down-sampling are shown for each LPL. The majority of ancestral nodes inferred as cattle suggest cattle as the primary reservoir. The root was estimated at 1802 (95% HPD 1731, 1861). Eleven locally persistent lineages (LPLs) were identified, all in the G(vi) clade and labeled LPL 1 through 11. For each LPL, the minimum number of SNPs all isolates are within (30 was set as the maximum) and the number of human and cattle isolates within the LPL, not including down-sampled isolates, are shown. With down-sampled isolates reincorporated, LPLs accounted for 46 human (38.0%) and 71 cattle (65.7%) isolates. The structured coalescent model estimated 108 cattle-to-human state transitions between branches, compared to only 14 human-to-human transitions, inferring cattle as the origin of 88.5% of human lineages.

Figure 3—figure supplement 1
Key parameters are drawn from the posterior distributions of four Markov chain Monte Carlo chains using different starting seeds compared to draws from the prior distribution.

The posterior distributions of the four chains for the tree height, clock rate, kappa, cattle effective population size, human effective population size, and backward migration rate all differed substantially from the prior distribution, shown on the far right in green for each graph.

Figure 3—figure supplement 2
Maximum clade credibility (MCC) tree of structured coalescent analysis of 168 subsampled isolates from humans and cattle from Alberta, 2007–2015.

LPLs are labeled as in the primary analysis (Figure 3). From an initial 121 human and 108 cattle isolates, our primary analysis contained 115 human and 84 cattle isolates after down-sampling. In this sensitivity analysis, we repeated our primary analysis with a subsample of 84 randomly selected isolates from humans and the 84 cattle isolates remaining after down-sampling. As in the primary analysis, cattle were inferred as the host of the majority of ancestral nodes. The root was estimated at 1796 (95% HPD 1722, 1859), very close to the root at 1802 estimated in the primary analysis. The most recent common ancestor (MRCA) of clade G(vi) strains in Alberta was inferred to be a cattle strain, dated to 1968 (95% HPD 1956, 1979), compared to 1969 in the primary analysis. We estimated 82 (95% HPD 79, 84) human lineages arose from cattle lineages, and 5 (95% HPD 0, 11) arose from other human lineages, meaning we inferred that 94.3% of human lineages arose from cattle lineages, compared to 88.5% in the primary analysis. The LPLs identified in this sensitivity analysis were mostly identical to those identified in the primary analysis. Differences in the sensitivity analysis were that G(vi)-AB LPL 8 expanded to a larger set of isolates; G(vi)-AB LPL 9, which included only five isolates in the primary analysis, was no longer identified as an LPL, as it no longer met the five-isolate criterion; and there were minor topological changes.

Figure 3—figure supplement 3
Sampled trees from four independent chains of the Alberta 2007–2015 structured coalescent analysis.

Across the four chains, 1,070,460,000 trees were sampled. Depicted here are 963,414,000 post-burn-in trees, with panels A-D each showing the trees from one chain. Lineages inferred with cattle ancestry are shown in blue, lineages inferred with human ancestry are shown in green. Reflecting the strong support for the lineages identified as locally persistent lineages (LPLs), the tree topology is well resolved.

Figure 3—figure supplement 4
LPLs from the primary analysis were defined using alternate SNP thresholds of 50 (left) and 75 (right) SNPs.

LPL 50.1 includes G(vi)-AB LPLs 1, 2, and 3 from the primary analysis using a SNP threshold of 30. LPLs 50.2 and 50.3 correspond to G(vi)-AB LPLs 4 and 5, respectively. LPL 50.4 includes G(vi)-AB LPLs 6, 7, and 8. LPL 75.1 includes all of the above locally persistent lineages (LPLs) at the lower thresholds. All 50- and 75- threshold LPLs in this section of the tree include isolates not included at the lower SNP threshold(s). No part of LPLs 50.5 or 50.6 was identified as an LPL using a threshold of 30, but LPLs 50.7 and 50.8 correspond exactly to G(vi)-AB LPLs 9 and 10. LPL 75.2 includes LPLs 50.5–50.8 and G(vi)-AB LPLs 9 and 10. G(vi)-AB LPL 11, LPL 50.9, and LPL 75.3 are all identical. At the 75-SNP threshold, two additional LPLs are identified with no corresponding LPLs at the 30- or 50-SNP thresholds. One of these is outside the G(vi) clade.

Extension of Alberta, Canada E. coli O157:H7 analysis to include 229 randomly selected study isolates and 430 additional public health isolates available from 2009–2019.

The maximum clade credibility (MCC) tree was constructed from a coalescent analysis with constant population size after down-sampling. Six locally persistent lineages (LPLs) in clade G(vi) continued to be associated with the disease after the initial study period. LPLs are colored and labeled as in Figure 3. After re-incorporating the down-sampled isolates, 74.7% of reported cases in 2018 and 2019 were associated with an LPL.

Shiga toxin gene (stx) profile by locally persistent lineage (LPL) status of extended analysis of Alberta, Canada E. coli O157:H7 isolated from cattle and humans, 2007–2019.

The stx profile across all clades shifted from the initial study period (2007–2015) to the later study period (2016–2019), with more of the virulent stx2a-only profile observed in 2018 and 2019 than in previous years. In 2018 and 2019, 51.2% of LPL isolates carried only stx2a, compared to 10.9% of non-LPL isolates. The peak in sequences in 2014 is due to two outbreaks; routine sequencing began in 2018 and 2019, accounting for the rise in sequenced cases during those years.

Comparison of Alberta, U.S., and global E. coli O157:H7 isolates.

Tips are colored based on isolate origin, and locally persistent lineages (LPLs) from Figure 3 are highlighted. A total of nine U.S. isolates arose from Alberta LPLs 2, 4, 7, and 11, all of which had Alberta isolates predating the U.S. isolates. No global isolates were associated with Alberta LPLs. Clade A was excluded from the analysis due to its high level of divergence from the rest of the E. coli O157:H7 population.

Figure 7 with 1 supplement
E. coli O157:H7 isolates were selected for the study and assembled.

Three sets of isolates, all originating from cattle or humans, were included in the study.

Figure 7—figure supplement 1
Clustering was performed from raw reads using PopPUNK v2.5.0 with 10,146 E.coli reference genomes Lees et al., 2019.

From the 246 isolates selected for sequencing for the study and 445 additional Alberta Health isolates included for contextualization, one isolate was removed prior to clustering analysis, because it was identified through metadata review as an environmental (non-human, non-cattle) isolate. Cluster 1 included the Sakai and EDL933 reference strains. Clusters 826 and 827 were novel clusters. Isolates outside of Cluster 1 were excluded from all subsequent analyses.

Tables

Table 1
Distribution of study isolates by geographic source, clade, and Shiga toxin gene (stx) profile.
CladeSourcestx1astx1a/stx2astx1a/stx2cstx1a/stx2a/stx2cstx2astx2cstx2a/stx2cstx2a/stx2c/stx2dstx2a/stx2dNone detectedTotal
G(vi)26820021000000894
Alberta04430013900000582
The U.S.2237007100000310
Global02000000002
Other G3142433600558
Alberta The0040010100015
U.S.2102119400332
Global100034100211
F0000532375517164
Alberta The00001291201135
U.S.0000331161506116
Global000083200013
Other (A-E)21410150200299
Alberta The10150011000027
U.S.00100124100036
Global11160015100236
Total76844522681068351141215
Table 2
Estimated migrations from structured coalescent analysis of Alberta, U.S., and global isolates, excluding clade A.
Migration directionMean95% HPD
Alberta to Alberta589570, 609
Alberta to Global84, 11
Alberta to U.S.126, 16
Global to Alberta6857, 78
Global to Global442363, 511
Global to U.S.296267, 335
U.S. to Alberta50, 14
U.S. to Global130, 35
U.S. to U.S.10410, 184
  1. HPD, highest posterior density interval.

Table 3
Analyses conducted and model priors.
AnalysisIsolates includedIsolates remaining after down-samplingTree modelSubstitution modelClock model
Primary analysis*Alberta 2007–2015 (n=229)115 human
84 cattle
Structured coalescent with two demes (1 per species); Ne initial value 10 and Weibull distribution (shape = 1, scale = 100); migration initial value 1.0 and Exponential distribution (mean = 1)HKY with empirical frequencies and discrete Gamma site model with four categoriesRelaxed log-normal with initial value 1.5×10–5 and Log-Normal distribution (M=1.5×10–5, S=1.5, with mean in real space)
Alberta long-term persistenceAlberta 2007–2019 (n=657)274 human
84 cattle
Coalescent constant population with Ne initial value 10 and Weibull distribution (shape = 1, scale = 100)HKY with empirical frequencies and discrete Gamma site model with four categoriesRelaxed log-normal with initial value 1.5×10–4 and Log-Normal distribution (M=1.5×10–4, S=1.5, with mean in real space)
Global circulation (unstructured)Alberta 2007–2019 (n=657) The
U.S.
1999–2019 (n=492)
Global 2007–2016 (n=61)
Alberta: 274 humans, 84 cattle
U.S.: 312 humans, 38 cattle
Global: 39 humans, 22 cattle
Coalescent constant population with Ne initial value 10 and Weibull distribution (shape = 1, scale = 100)HKY with empirical frequencies and discrete Gamma site model with four categoriesRelaxed log-normal with initial value 1.5×10–4 and Log-Normal distribution (M=1.5×10–4, S=1.5, with mean in real space)
Global circulation (structured)Alberta 2007–2019 (n=657) The
U.S.
1999–2019 (n=492)
Global 2007–2016 (n=61)
Alberta: 274 humans, 84 cattle
U.S.: 312 humans, 38 cattle
Global: 39 humans, 22 cattle
Structured coalescent with three demes (1 per geography); Ne initial value 10 and Weibull distribution (shape = 1, scale = 100); migration initial value 1.0 and Exponential distribution (mean = 1)HKY with empirical frequencies and discrete Gamma site model with four categoriesRelaxed log-normal with initial value 1.5×10–4 and Log-Normal distribution (M=1.5×10–4, S=1.5, with mean in real space)
  1. *

    A primary analysis was run using four different random seeds to confirm that all converged to the same solution. The four runs were combined to produce the final maximum clade credibility tree and state transition estimates. Model priors from this analysis were also used in an analysis in which draws were taken from the prior distribution, as opposed to the posterior distribution, to confirm that the final results were not overly influenced by the choice of priors.

  2. Clade A isolates were excluded from these analyses given the very small number available from any locale and clade A’s divergence from the rest of the clades.

Additional files

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Gillian AM Tarr
  2. Linda Chui
  3. Kim Stanford
  4. Emmanuel W Bumunang
  5. Rahat Zaheer
  6. Vincent Li
  7. Stephen B Freedman
  8. Chad R Laing
  9. Tim A McAllister
(2025)
Persistent cross-species transmission systems dominate Shiga toxin-producing Escherichia coli O157:H7 epidemiology in a high incidence region: A genomic epidemiology study
eLife 13:RP97643.
https://doi.org/10.7554/eLife.97643.3