Core genes can have higher recombination rates than accessory genes within global microbial populations

  1. Asher Preska Steinberg
  2. Mingzhi Lin
  3. Edo Kussell  Is a corresponding author
  1. New York University, United States

Abstract

Recombination is essential to microbial evolution, and is involved in the spread of antibiotic resistance, antigenic variation, and adaptation to the host niche. However, assessing the impact of homologous recombination on accessory genes which are only present in a subset of strains of a given species remains challenging due to their complex phylogenetic relationships. Quantifying homologous recombination for accessory genes (which are important for niche-specific adaptations) in comparison to core genes (which are present in all strains and have essential functions) is critical to understanding how selection acts on variation to shape species diversity and genome structures of bacteria. Here, we apply a computationally efficient, non-phylogenetic approach to measure homologous recombination rates in the core and accessory genome using >100,000 whole genome sequences from Streptococcus pneumoniae and several additional species. By analyzing diverse sets of sequence clusters, we show that core genes often have higher recombination rates than accessory genes, and for some bacterial species the associated effect sizes for these differences are pronounced. In a subset of species, we find that gene frequency and homologous recombination rate are positively correlated. For S. pneumoniae and several additional species, we find that while the recombination rate is higher for the core genome, the mutational divergence is lower, indicating that divergence-based homologous recombination barriers could contribute to differences in recombination rates between the core and accessory genome. Homologous recombination may therefore play a key role in increasing the efficiency of selection in the most conserved parts of the genome.

Data availability

Lists of SRA accession numbers corresponding to the raw reads used to build the multi-sequence alignments analyzed in this manuscript are included as Figure 2 - source data 1 and Figure 3 - source data 1. All SRA files, reference genomes, and complete genome assemblies are available through NCBI. All sequence collections used are listed in Supplementary File 5. For the PubMLST sequence collections, PubMLST was used to identify whole genome sequences (by filtering for strains in the 'Genome Collection' of each species where the sequence length is at least that of the reference genome), then the raw reads were downloaded from NCBI using their SRA numbers. Accession numbers for reference genomes used for each microbial species are also listed in Supplementary File 5.All original code has been deposited at GitHub and is publicly available. Links are given below:- https://github.com/kussell-lab/mcorr- https://github.com/kussell-lab/mcorr-clustering- https://github.com/kussell-lab/ReferenceAlignmentGenerator- https://github.com/kussell-lab/PangenomeAlignmentGenerator

The following previously published data sets were used

Article and author information

Author details

  1. Asher Preska Steinberg

    Department of Biology, New York University, New York, United States
    Competing interests
    The authors declare that no competing interests exist.
  2. Mingzhi Lin

    Department of Biology, New York University, New York, United States
    Competing interests
    The authors declare that no competing interests exist.
  3. Edo Kussell

    Department of Biology, New York University, New York, United States
    For correspondence
    edo.kussell@nyu.edu
    Competing interests
    The authors declare that no competing interests exist.
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-0590-4036

Funding

National Institutes of Health (R01-GM097356)

  • Edo Kussell

Simons Foundation (Simons Foundation Awardee of the Life Sciences Research Foundation)

  • Asher Preska Steinberg

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Reviewing Editor

  1. Paul B Rainey, Max Planck Institute for Evolutionary Biology, Germany

Version history

  1. Preprint posted: September 14, 2021 (view preprint)
  2. Received: March 10, 2022
  3. Accepted: June 30, 2022
  4. Accepted Manuscript published: July 8, 2022 (version 1)
  5. Version of Record published: September 5, 2022 (version 2)

Copyright

© 2022, Preska Steinberg et al.

This article is distributed under the terms of the Creative Commons Attribution License permitting unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 3,457
    views
  • 495
    downloads
  • 8
    citations

Views, downloads and citations are aggregated across all versions of this paper published by eLife.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Asher Preska Steinberg
  2. Mingzhi Lin
  3. Edo Kussell
(2022)
Core genes can have higher recombination rates than accessory genes within global microbial populations
eLife 11:e78533.
https://doi.org/10.7554/eLife.78533

Share this article

https://doi.org/10.7554/eLife.78533

Further reading

    1. Microbiology and Infectious Disease
    Moagi Tube Shaku, Peter K Um ... Bavesh D Kana
    Research Article

    Mechanisms by which Mycobacterium tuberculosis (Mtb) evades pathogen recognition receptor activation during infection may offer insights for the development of improved tuberculosis (TB) vaccines. Whilst Mtb elicits NOD-2 activation through host recognition of its peptidoglycan-derived muramyl dipeptide (MDP), it masks the endogenous NOD-1 ligand through amidation of glutamate at the second position in peptidoglycan side-chains. As the current BCG vaccine is derived from pathogenic mycobacteria, a similar situation prevails. To alleviate this masking ability and to potentially improve efficacy of the BCG vaccine, we used CRISPRi to inhibit expression of the essential enzyme pair, MurT-GatD, implicated in amidation of peptidoglycan side-chains. We demonstrate that depletion of these enzymes results in reduced growth, cell wall defects, increased susceptibility to antibiotics, altered spatial localization of new peptidoglycan and increased NOD-1 expression in macrophages. In cell culture experiments, training of a human monocyte cell line with this recombinant BCG yielded improved control of Mtb growth. In the murine model of TB infection, we demonstrate that depletion of MurT-GatD in BCG, which is expected to unmask the D-glutamate diaminopimelate (iE-DAP) NOD-1 ligand, yields superior prevention of TB disease compared to the standard BCG vaccine. In vitro and in vivo experiments in this study demonstrate the feasibility of gene regulation platforms such as CRISPRi to alter antigen presentation in BCG in a bespoke manner that tunes immunity towards more effective protection against TB disease.

    1. Microbiology and Infectious Disease
    Ryan Thiermann, Michael Sandler ... Suckjoon Jun
    Tools and Resources

    Despite much progress, image processing remains a significant bottleneck for high-throughput analysis of microscopy data. One popular platform for single-cell time-lapse imaging is the mother machine, which enables long-term tracking of microbial cells under precisely controlled growth conditions. While several mother machine image analysis pipelines have been developed in the past several years, adoption by a non-expert audience remains a challenge. To fill this gap, we implemented our own software, MM3, as a plugin for the multidimensional image viewer napari. napari-MM3 is a complete and modular image analysis pipeline for mother machine data, which takes advantage of the high-level interactivity of napari. Here, we give an overview of napari-MM3 and test it against several well-designed and widely used image analysis pipelines, including BACMMAN and DeLTA. Researchers often analyze mother machine data with custom scripts using varied image analysis methods, but a quantitative comparison of the output of different pipelines has been lacking. To this end, we show that key single-cell physiological parameter correlations and distributions are robust to the choice of analysis method. However, we also find that small changes in thresholding parameters can systematically alter parameters extracted from single-cell imaging experiments. Moreover, we explicitly show that in deep learning-based segmentation, ‘what you put is what you get’ (WYPIWYG) – that is, pixel-level variation in training data for cell segmentation can propagate to the model output and bias spatial and temporal measurements. Finally, while the primary purpose of this work is to introduce the image analysis software that we have developed over the last decade in our lab, we also provide information for those who want to implement mother machine-based high-throughput imaging and analysis methods in their research.