Core genes can have higher recombination rates than accessory genes within global microbial populations
Abstract
Recombination is essential to microbial evolution, and is involved in the spread of antibiotic resistance, antigenic variation, and adaptation to the host niche. However, assessing the impact of homologous recombination on accessory genes which are only present in a subset of strains of a given species remains challenging due to their complex phylogenetic relationships. Quantifying homologous recombination for accessory genes (which are important for niche-specific adaptations) in comparison to core genes (which are present in all strains and have essential functions) is critical to understanding how selection acts on variation to shape species diversity and genome structures of bacteria. Here, we apply a computationally efficient, non-phylogenetic approach to measure homologous recombination rates in the core and accessory genome using >100,000 whole genome sequences from Streptococcus pneumoniae and several additional species. By analyzing diverse sets of sequence clusters, we show that core genes often have higher recombination rates than accessory genes, and for some bacterial species the associated effect sizes for these differences are pronounced. In a subset of species, we find that gene frequency and homologous recombination rate are positively correlated. For S. pneumoniae and several additional species, we find that while the recombination rate is higher for the core genome, the mutational divergence is lower, indicating that divergence-based homologous recombination barriers could contribute to differences in recombination rates between the core and accessory genome. Homologous recombination may therefore play a key role in increasing the efficiency of selection in the most conserved parts of the genome.
Data availability
Lists of SRA accession numbers corresponding to the raw reads used to build the multi-sequence alignments analyzed in this manuscript are included as Figure 2 - source data 1 and Figure 3 - source data 1. All SRA files, reference genomes, and complete genome assemblies are available through NCBI. All sequence collections used are listed in Supplementary File 5. For the PubMLST sequence collections, PubMLST was used to identify whole genome sequences (by filtering for strains in the 'Genome Collection' of each species where the sequence length is at least that of the reference genome), then the raw reads were downloaded from NCBI using their SRA numbers. Accession numbers for reference genomes used for each microbial species are also listed in Supplementary File 5.All original code has been deposited at GitHub and is publicly available. Links are given below:- https://github.com/kussell-lab/mcorr- https://github.com/kussell-lab/mcorr-clustering- https://github.com/kussell-lab/ReferenceAlignmentGenerator- https://github.com/kussell-lab/PangenomeAlignmentGenerator
Article and author information
Author details
Funding
National Institutes of Health (R01-GM097356)
- Edo Kussell
Simons Foundation (Simons Foundation Awardee of the Life Sciences Research Foundation)
- Asher Preska Steinberg
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Copyright
© 2022, Preska Steinberg et al.
This article is distributed under the terms of the Creative Commons Attribution License permitting unrestricted use and redistribution provided that the original author and source are credited.
Metrics
-
- 3,981
- views
-
- 549
- downloads
-
- 23
- citations
Views, downloads and citations are aggregated across all versions of this paper published by eLife.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading
-
- Computational and Systems Biology
- Microbiology and Infectious Disease
The Staphylococcus aureus clonal complex 8 (CC8) is made up of several subtypes with varying levels of clinical burden; from community-associated methicillin-resistant S. aureus USA300 strains to hospital-associated (HA-MRSA) USA500 strains and ancestral methicillin-susceptible (MSSA) strains. This phenotypic distribution within a single clonal complex makes CC8 an ideal clade to study the emergence of mutations important for antibiotic resistance and community spread. Gene-level analysis comparing USA300 against MSSA and HA-MRSA strains have revealed key horizontally acquired genes important for its rapid spread in the community. However, efforts to define the contributions of point mutations and indels have been confounded by strong linkage disequilibrium resulting from clonal propagation. To break down this confounding effect, we combined genetic association testing with a model of the transcriptional regulatory network (TRN) to find candidate mutations that may have led to changes in gene regulation. First, we used a De Bruijn graph genome-wide association study to enrich mutations unique to the USA300 lineages within CC8. Next, we reconstructed the TRN by using independent component analysis on 670 RNA-sequencing samples from USA300 and non-USA300 CC8 strains which predicted several genes with strain-specific altered expression patterns. Examination of the regulatory region of one of the genes enriched by both approaches, isdH, revealed a 38-bp deletion containing a Fur-binding site and a conserved single-nucleotide polymorphism which likely led to the altered expression levels in USA300 strains. Taken together, our results demonstrate the utility of reconstructed TRNs to address the limits of genetic approaches when studying emerging pathogenic strains.
-
- Immunology and Inflammation
- Microbiology and Infectious Disease
Pseudomonas aeruginosa (PA) is an opportunistic, frequently multidrug-resistant pathogen that can cause severe infections in hospitalized patients. Antibodies against the PA virulence factor, PcrV, protect from death and disease in a variety of animal models. However, clinical trials of PcrV-binding antibody-based products have thus far failed to demonstrate benefit. Prior candidates were derivations of antibodies identified using protein-immunized animal systems and required extensive engineering to optimize binding and/or reduce immunogenicity. Of note, PA infections are common in people with cystic fibrosis (pwCF), who are generally believed to mount normal adaptive immune responses. Here, we utilized a tetramer reagent to detect and isolate PcrV-specific B cells in pwCF and, via single-cell sorting and paired-chain sequencing, identified the B cell receptor (BCR) variable region sequences that confer PcrV-specificity. We derived multiple high affinity anti-PcrV monoclonal antibodies (mAbs) from PcrV-specific B cells across three donors, including mAbs that exhibit potent anti-PA activity in a murine pneumonia model. This robust strategy for mAb discovery expands what is known about PA-specific B cells in pwCF and yields novel mAbs with potential for future clinical use.