Core genes can have higher recombination rates than accessory genes within global microbial populations

Abstract
Data availability
Article and author information
Metrics

Abstract

Recombination is essential to microbial evolution, and is involved in the spread of antibiotic resistance, antigenic variation, and adaptation to the host niche. However, assessing the impact of homologous recombination on accessory genes which are only present in a subset of strains of a given species remains challenging due to their complex phylogenetic relationships. Quantifying homologous recombination for accessory genes (which are important for niche-specific adaptations) in comparison to core genes (which are present in all strains and have essential functions) is critical to understanding how selection acts on variation to shape species diversity and genome structures of bacteria. Here, we apply a computationally efficient, non-phylogenetic approach to measure homologous recombination rates in the core and accessory genome using >100,000 whole genome sequences from Streptococcus pneumoniae and several additional species. By analyzing diverse sets of sequence clusters, we show that core genes often have higher recombination rates than accessory genes, and for some bacterial species the associated effect sizes for these differences are pronounced. In a subset of species, we find that gene frequency and homologous recombination rate are positively correlated. For S. pneumoniae and several additional species, we find that while the recombination rate is higher for the core genome, the mutational divergence is lower, indicating that divergence-based homologous recombination barriers could contribute to differences in recombination rates between the core and accessory genome. Homologous recombination may therefore play a key role in increasing the efficiency of selection in the most conserved parts of the genome.

Data availability

Lists of SRA accession numbers corresponding to the raw reads used to build the multi-sequence alignments analyzed in this manuscript are included as Figure 2 - source data 1 and Figure 3 - source data 1. All SRA files, reference genomes, and complete genome assemblies are available through NCBI. All sequence collections used are listed in Supplementary File 5. For the PubMLST sequence collections, PubMLST was used to identify whole genome sequences (by filtering for strains in the 'Genome Collection' of each species where the sequence length is at least that of the reference genome), then the raw reads were downloaded from NCBI using their SRA numbers. Accession numbers for reference genomes used for each microbial species are also listed in Supplementary File 5.All original code has been deposited at GitHub and is publicly available. Links are given below:- https://github.com/kussell-lab/mcorr- https://github.com/kussell-lab/mcorr-clustering- https://github.com/kussell-lab/ReferenceAlignmentGenerator- https://github.com/kussell-lab/PangenomeAlignmentGenerator

The following previously published data sets were used

(2018) Open-access bacterial population genomics: BIGSdb software, the PubMLST.org website and their applications
PubMLST.

https://pubmlst.org
1. O'Leary NA et al.
(2016) Reference sequence (RefSeq) database at NCBI: current status
NCBI RefSeq.

https://www.ncbi.nlm.nih.gov/refseq/

Article and author information

Author details

Asher Preska Steinberg

Department of Biology, New York University, New York, United States

Competing interests
The authors declare that no competing interests exist.
Mingzhi Lin

Department of Biology, New York University, New York, United States

Competing interests
The authors declare that no competing interests exist.
Edo Kussell

Department of Biology, New York University, New York, United States

For correspondence
edo.kussell@nyu.edu

Competing interests
The authors declare that no competing interests exist.

"This ORCID iD identifies the author of this article:" 0000-0003-0590-4036

Funding

National Institutes of Health (R01-GM097356)

Edo Kussell

Simons Foundation (Simons Foundation Awardee of the Life Sciences Research Foundation)

Asher Preska Steinberg

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Copyright

This article is distributed under the terms of the Creative Commons Attribution License permitting unrestricted use and redistribution provided that the original author and source are credited.