Interpreting roles of mutations associated with the emergence of S. aureus USA300 strains using transcriptional regulatory network reconstruction

  1. Department of Bioengineering, University of California San Diego
  2. Palmona Pathogenomics
  3. Collaborative to Halt Antibiotic-Resistant Microbes (CHARM), Department of Pediatrics, University of California San Diego
  4. Department of Pediatrics, University of California San Diego

Peer review process

Not revised: This Reviewed Preprint includes the authors’ original preprint (without revision), an eLife assessment, and public reviews.

Read more about eLife’s peer review process.

Editors

  • Reviewing Editor
    Marisa Nicolás
    Laboratório Nacional de Computação Científica, Rio de Janeiro, Brazil
  • Senior Editor
    Aleksandra Walczak
    École Normale Supérieure - PSL, Paris, France

Reviewer #1 (Public Review):

Summary:
This is large-scale genomics and transcriptomics study of the epidemic community-acquired methicillin-resistant S. aureus clone USA300, designed to identify core genome mutations that drove the emergence of the clone. It used publicly available datasets and a combination of genome-wide association studies (GWAS) and independent principal-component analysis (ICA) of RNA-seq profiles to compare USA300 versus non-USA300 within clonal complex 8. By overlapping the analyses the authors identified a 38bp deletion upstream of the iron-scavenging surface-protein gene isdH that was both significantly associated with the USA300 lineage and with a decreased transcription of the gene.

Strengths:
Several genomic studies have investigated genomic factors driving the emergence of successful S. aureus clones, in particular USA300. These studies have often focussed on acquisition of key accessory genes or have focussed on a small number of strains. This study makes a smart use of publicly available repositories to leverage the sample size of the analysis and identify new genomics markers of USA300 success.
The approach of combining large-scale genomics and transcriptomics analysis is powerful, as it allows to make some inferences on the impact of the mutations. This is particularly important for mutations in intergenic regions, whose functional impact is often uncertain.
The statistical genomics approaches are elegant and state-of-the-art and can be easily applied to other contexts or pathogens.

Weaknesses:
The main weakness of this work is that these data don't allow a casual inference on the role of isdH in driving the emergence of USA300. It is of course impossible to prove which mutation or gene drove the success of the clone, however, experimental data would have strengthened the conclusions of the authors in my opinion.
Another limitation of this approach is that the approach taken here doesn't allow to make any conclusions on the adaptive role of the isdH mutation. In other words, it is still possible that the mutation is just a marker of USA300 success, due to other factors such as PVL, ACMI or the SCCmecIVa. This is because by its nature this analysis is heavily influenced by population structure. Usually, GWAS is applied to find genetic loci that are associated with a phenotype and are independent of the underlying population structure. Here, authors are using GWAS to find loci that are associated with a lineage. In other words, they are simply running a univariate analysis (likely a logistic regression) between genetic loci and the lineage without any correction for population structure, since population structure is the outcome. Therefore, this approach can't be applied to most phenotype-genotype studies where correction for population structure is critical.
Finally, the approach used is complex and not easily reproduced in another dataset. Although I like DBGWAS and find the network analysis elegant, I would be interested in seeing how a simpler GWAS tool like Pyseer would perform.

Reviewer #2 (Public Review):

Summary:

The work of Poudel et al. identified potential causal mutations related to the successful emergence of the virulent USA300 community-associated MRSA clone within clonal complex 8. To achieve this, the authors employed a methodology that combines the genome-wide association studies (GWAS) with the inference of a transcriptional regulatory network (TRN) through the independent component analysis (ICA) method from publicly available transcriptomic data. Thus, they identified genes with altered expression in the iModulons calculated by ICA and enriched mutations obtained from the De Bruijn graph genome-wide association study (DBGWAS) in the USA300 strains versus non-USA300 strains. The results revealed a deletion of 38 base pairs, containing a binding site for the Fur repressor, and an A→T mutation, both occurring in the upstream region of the isdH gene, whose expression level in USA300 strains exhibited a general increase compared to the other group. IsdH encodes the iron-regulated surface determinant protein H, which plays a crucial role in iron acquisition from heme and immune system evasion - two essential processes for the pathogenicity of S. aureus.

Strengths:

The clonal complex 8 (CC8), one of the most prevalent among S. aureus, encompasses strains responsible for both community-associated MRSA infections (CA-MRSA) and healthcare-associated (HA) infections (HA-MRSA and HA-MSSA). Within the CC8, one of the most prominent lineages is USA300, which emerged in the early 2000s and has since become a leading cause of CA-MRSA infections in the United States. The key genetic traits that characterize USA300 strains include the presence of the Panton-Valentine leukocidin (PVL) encoded by the genes lukF-PV and lukS-PV, the staphylococcal chromosomal cassette mec IVa (SCCmecIVa), and the arginine catabolic mobile element (ACME). Investigating the phenotypic impact of individual mutations on the success of epidemic strains through GWAS poses a challenge due to two main confounding factors: genome-wide linkage disequilibrium (LD) and population structure. The genome-wide LD is associated with false positives, where linked non-causal mutations are mistakenly identified as causal due to the same genomic backgrounds. Therefore, the strength of this work lies in the use of publicly available transcriptomic data to construct a TRN based on ICA. This approach validates the mutations enriched by GWAS and reduces the occurrence of false positives attributed to high genome-wide LD. By integrating various 'omics' data sources, this method enhances the reliability of the results and has successfully identified new potential genetic markers specific to USA300 strains. Furthermore, it revealed mutations within core genes and intergenic regulatory regions, findings that can be validated through experimental data.

Weaknesses:

GWAS aims to identify statistically significant associations that suggest a causal link between genotype and the specific phenotype of interest while simultaneously filtering out spurious associations caused by confounding factors. While the method described in this study minimizes the impact of genome-wide linkage disequilibrium (LD), it does not extend to addressing population structure. This is because the objective was precisely to identify mutations associated with the emergence of the USA300 clone. In this context, the confounding element arising from shared ancestry becomes the subject of analysis rather than an issue to be corrected. Therefore, it is essential to highlight that the method proposed in this work can not be applied to genome-wide association studies, where correction for population structure is critical for distinguishing genuine causal associations from spurious ones. This correction is crucial and necessary to most of the studied phenotypes of interest.

Another limitation is that, although the authors emphasize the mutation in the isdH gene, the analyses conducted in this study do not provide insight into any potential adaptive function associated with it. Similarly, like the other genes exhibiting distinct expression patterns associated with enriched mutations from DBGWAS in USA300 strains, isdH is among the potential markers related to the success of the clone. This group includes well-established markers, such as ACME, which carries relevant genes like the arc operon and the speG gene that contribute to virulence and survival at infection sites.

Finally, despite the availability of the codes on GitHub, the analysis itself is not easily reproducible or adaptable to other datasets.

  1. Howard Hughes Medical Institute
  2. Wellcome Trust
  3. Max-Planck-Gesellschaft
  4. Knut and Alice Wallenberg Foundation