Identical sequences found in distant genomes reveal frequent horizontal transfer across the bacterial domain

Abstract
Data availability
Article and author information
Metrics

Abstract

Horizontal Gene Transfer (HGT) is an essential force in microbial evolution. Despite detailed studies on a variety of systems, a global picture of HGT in the microbial world is still missing. Here, we exploit that HGT creates long identical DNA sequences in the genomes of distant species, which can be found efficiently using alignment-free methods. Our pairwise analysis of 93 481 bacterial genomes identified 138 273 HGT events. We developed a model to explain their statistical properties as well as estimate the transfer rate between pairs of taxa. This reveals that long-distance HGT is frequent: our results indicate that HGT between species from different phyla has occurred in at least 8% of the species. Finally, our results confirm that the function of sequences strongly impacts their transfer rate, which varies by more than 3 orders of magnitude between different functional categories. Overall, we provide a comprehensive view of HGT, illuminating a fundamental process driving bacterial evolution.

Data availability

Results of the analysis are provided as supplementary files

The following previously published data sets were used

1. O'Leary NA
2. et al
(2015) Prokaryotic RefSeq Genomes
RefSeq.

https://www.ncbi.nlm.nih.gov/refseq/about/prokaryotes/

Article and author information

Author details

Michael Sheinman

Division of Molecular Carcinogenesis, Netherlands Cancer Institute, Amsterdam, Netherlands

For correspondence
mishashe@gmail.com

Competing interests
The authors declare that no competing interests exist.
Ksenia Arkhipova

Theoretical Biology and Bioinformatics, Utrecht University, Utrecht, Netherlands

Competing interests
The authors declare that no competing interests exist.
Peter F Arndt

Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, Germany

Competing interests
The authors declare that no competing interests exist.

"This ORCID iD identifies the author of this article:" 0000-0003-1762-9836
Bas Dutilh

Theoretical Biology and Bioinformatics, Utrecht University, Utrecht, Netherlands

Competing interests
The authors declare that no competing interests exist.
Rutger Hermsen

Science faculty, Biology Department, Utrecht University, Utrecht, Netherlands

For correspondence
r.hermsen@uu.nl

Competing interests
The authors declare that no competing interests exist.

"This ORCID iD identifies the author of this article:" 0000-0003-4633-4877
Florian Massip

Evolutionary and Cancer Genomics, Berlin Institute for Medical Systems Biology, Max Delbrueck Center for Molecular Medicine,, Berlin, Germany

For correspondence
florian.massip@mdc-berlin.de

Competing interests
The authors declare that no competing interests exist.

"This ORCID iD identifies the author of this article:" 0000-0001-5855-0935

Funding

Netherlands Organisation for Scientific Research (Vidi grant 864.14.004)

Ksenia Arkhipova
Bas Dutilh

H2020 European Research Council (Consolidator Grant 865694: DiversiPHI)

Bas Dutilh

Fondation pour la Recherche Médicale (SPE201803005264)

Florian Massip

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Copyright

This article is distributed under the terms of the Creative Commons Attribution License permitting unrestricted use and redistribution provided that the original author and source are credited.