Pan-genomic and phylogenetic characterization of the collected emm89 S. pyogenes isolates.

(A) emm genotyping of the 207 clinical isolates collected in Japan. (B) Pan-genome analysis of the Japanese and global cohorts. All genes detected in each cohort were classified into four groups, according to prevalence: core, soft-core, shell, and cloud genes. (C) Phylogenetic tree for the global cohort, based on the sequences of the core genes. From the inside, the color bars show clusters, phenotypes, MLSTs, clades, countries where strains were isolated, and global regions, in the order mentioned. The root of the tree was set as the mid-point. The scale located upper left indicates 0.001 times substitution of the bases on average. MLST, multilocus sequence typing

Genome-wide association study on SNPs/indels.

(A–B) Manhattan plots for the Japanese (A) and global (B) cohorts. The X-axis shows the location of each SNP/indel on the core gene alignment, while the Yaxis indicates the p-value. For each cohort, a permutation test was performed by iterating the calculations 1,000 times with randomly permuted phenotypes, with the significance level set at the 5th percentile of the 1,000 minimal p-values (p=5.75×10-4 and p=1.05×10-4 for the Japanese and global cohorts, respectively). Plots with lower p-values than genome-wide significant levels are colored magenta and blue, based on the direction of their effect size (positive and negative, respectively). (C) Distribution heatmap for the global cohort, with the strains possessing the significant SNPs/indels colored orange. Only the 20 SNPs with the lowest p-values are shown in this heatmap. The color bars above indicate countries and phenotypes. The position is the location of each SNP/indel on the core gene alignment. SNP, single-nucleotide polymorphism; indel, insertion/deletion

Genome-wide association study on gene presence.

(A–B) Volcano plots for the Japanese (A) and global (B) cohorts. The X-axis shows the effect size, while the Y-axis indicates the p-value. Plots with a lower p-value than the genome-wide significant levels (p=1.09×10−4 and p=7.72×10−5) have been colored magenta and blue, based on the direction of their effect size (positive and negative, respectively). (C) Distribution heatmap for the global cohort, with the strains possessing the significant genes colored orange. Only the 20 genes with the lowest p-values are shown in this heatmap. The color bars above indicate countries and phenotypes.

K-mers related to pathology.

Detailed results of the GWAS on k-mers for the Japanese cohort in two genomic regions: covS (A, top left) and group_184 (B); and the global cohort in five regions: covS (A, top right), an intergenic region (C), group_141-143 (D), sagG (E), and fhuB (F). (A–F top) de-Bruijn graphs generated using DBGWAS. The respective nodes in the graphs indicate k-mers, with the significant nodes indicated using green arrowheads. The size of each node corresponds to the allele frequency. (A–F, bottom) Alignments of k-mers or maps of k-mers on the genome sequence of MGAS27061 around regions including significant mutations. Each base is colored according to the interpreted amino acids. (A) The significant k-mers pointed by arrowheads cause a frameshift mutation, truncating CovS protein to 35 amino acids. (B) The significant k-mers indicate that the presence of a sequence mapped on the first 26 bp of group_184 and its upstream 20 bp was related to the pathology. (C) Two significant k-mers indicate the same intergenic region of 270 bps. (D) The significant k-mers are indicated using arrowheads.

Predicted protein structure models.

(A) Snake-like plot of transmembrane regions of CovS estimated using SOSUI. The frameshift mutations detected by the SNPs/indel- andk-mer-based GWASes cause truncation of CovS at the indicated 35th and 45th residues, respectively. (B) Structural model of the CovS homodimer (ipTM + pTM=0.614). The putative transmembrane regions are colored in orange. The upper part is the sensor domain, while the lower is the C-terminal kinase domain involved in phosphorylation of the transcriptional regulator CovR. (C) Snake-like plot depicting the transmembrane regions of FhuB and FhuG. The 73rd residue of FhuB has been indicated with an arrowhead and in magenta. (D) Structural model of the FhuBDCCG complex (ipTM + pTM=0.791). The putative transmembrane regions of FhuB and FhuG are colored in green-yellow and peach, respectively. The upper part of the model is located in the extracellular region, while the lower part is in the cytoplasm. (E) The 73rd valine in FhuB, colored in magenta, was substituted with alanine. ipTM + pTM: Weighted combination of interface-predicted TM and predicted TM scores. ipTM is used to measure structural accuracy in the protein-protein interface, while ipTM is a metric for overall topological accuracy.

Transcriptome analysis of the fhuB T218 mutant strain in THY and human blood.

(A) Principal component analysis plot of RNA-seq data. (B) Differentially expressed genes in four comparisons. (C) Plot of gene expression in the WT strain vs. fhuB T218C mutant in human blood. Significantly upregulated and downregulated genes in the WT have been colored red and blue, respectively. The shapes of the plots indicate relative transcriptional changes between THY and blood. Genes depicted with upward triangles were either significantly upregulated in the WT or downregulated in the mutant strain, in blood vs. THY. The downward triangle plots indicate genes that are either downregulated in the WT or upregulated in the mutant strain, in blood vs. THY. WT, wild-type; THY, Todd Hewitt broth supplemented with 0.2% yeast extract

Effects of the fhuB T218 mutant strain on ferric ion uptake and bacterial survival in human blood.

(A) Intracellular ferric ion assay. The WT and fhuB T218C mutant strains were incubated in healthy human blood for 3 h, following which the intracellular ferric ion concentrations were measured (n=18 technical replicates). (B) Bactericidal assay in human blood. The WT, fhuB T218C mutant, and Wr strains were mixed with healthy human blood to measure the ratio of bacterial counts at 1, 2, and 3 h to those at 0 h after infection (n=18 technical replicates). (C– F) Bactericidal assay in human erythrocyte-rich medium (C), polymorphonuclear cell-rich medium (D), plasma (E), and plasma inactivated by heating at 56°C for 30 min (F) (n=18 technical replicates). (G) Bacterial growth in brain heart infusion broth (n=18 technical replicates). Representative data from one of three biological replicates are displayed. Thick bars and error bars indicate means and quartiles, respectively. Statistical significance was determined by means of the Mann–Whitney test with Benjamini–Hochberg’s correction.