Highly contiguous assemblies of 101 drosophilid genomes
Abstract
Over 100 years of studies in Drosophila melanogaster and related species in the genus Drosophila have facilitated key discoveries in genetics, genomics, and evolution. While high-quality genome assemblies exist for several species in this group, they only encompass a small fraction of the genus. Recent advances in long-read sequencing allow high-quality genome assemblies for tens or even hundreds of species to be efficiently generated. Here, we utilize Oxford Nanopore sequencing to build an open community resource of genome assemblies for 101 lines of 93 drosophilid species encompassing 14 species groups and 35 sub-groups. The genomes are highly contiguous and complete, with an average contig N50 of 10.5 Mb and greater than 97% BUSCO completeness in 97/101 assemblies. We show that Nanopore-based assemblies are highly accurate in coding regions, particularly with respect to coding insertions and deletions. These assemblies, along with a detailed laboratory protocol and assembly pipelines, are released as a public resource and will serve as a starting point for addressing broad questions of genetics, ecology, and evolution at the scale of hundreds of species.
Data availability
All sequencing data and assemblies generated by this study are deposited at NCBI SRA and GenBank under NCBI BioProject PRJNA675888. Accession numbers for all data used but not generated by this study are provided in the supporting files. Dockerfiles and scripts for reproducing pipelines and analyses are provided on GitHub (https://github.com/flyseq/drosophila_assembly_pipelines). A detailed wet lab protocol is provided at Protocols.io (https://dx.doi.org/10.17504/protocols.io.bdfqi3mw).
-
Nanopore-based assembly of many drosophilid genomesNCBI BioProject, PRJNA675888.
-
Sequencing and assembly of 14 Drosophila speciesNCBI BioProject, ID: 427774.
-
modENCODE Drosophila reference genome sequencing (fruit flies)NCBI BioProject, ID: 62477.
-
Drosophila montium Species Group Genomes ProjectNCBI BioProject, ID: 554346.
-
Invertebrate sample from Drosophila repletaNCBI BioProject, ID: 476692.
-
Genome sequences of 10 Drosophila speciesNCBI BioProject, ID: 322011.
-
Raw genomic sequencing data from 16 Drosophila speciesNCBI BioProject, ID: 550077.
Article and author information
Author details
Funding
National Institute of General Medical Sciences (F32GM135998)
- Bernard Y Kim
National Institute of General Medical Sciences (R35GM119816)
- Noah K Whiteman
Uehara Memorial Foundation (201931028)
- Teruyuki Matsunaga
Ministry of Education, Science and Technological Development of the Republic of Serbia (451-03-68/2020-14/200178)
- Marina Stamenković-Radak
- Mihailo Jelić
- Marija Savić Veselinović
Ministry of Education, Science and Technological Development of the Republic of Serbia (451-03-68/2020-14/200007)
- Marija Tanasković
- Pavle Erić
National Natural Science Foundation of China (32060112)
- Jian-Jun Gao
Japan Society for the Promotion of Science (JP18K06383)
- Masayoshi Watada
European Union Horizon 2020 Research and Innovation Program (765937-CINCHRON)
- Giulia Manoli
- Enrico Bertolini
Czech Science Foundation (19-13381S)
- Vladimír Košťál
Japan Society for the Promotion of Science (JP19H03276)
- Aya Takahashi
National Science Foundation (1345247)
- Donald K Price
National Institute of General Medical Sciences (R35GM118165)
- Dmitri A Petrov
National Institute of Diabetes and Digestive and Kidney Diseases (K01DK119582)
- Jeremy Wang
National Science Foundation (DEB-1457707)
- Corbin D Jones
National Institute of General Medical Sciences (R01GM121750)
- Daniel R Matute
National Institute of General Medical Sciences (R01GM125715)
- Daniel R Matute
Google Cloud Platform Research Credits
- Bernard Y Kim
Google Cloud Platform Research Credits
- Jeremy Wang
National Institute of General Medical Sciences (R35GM122592)
- Artyom Kopp
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Reviewing Editor
- Graham Coop, University of California, Davis, United States
Version history
- Preprint posted: December 15, 2020 (view preprint)
- Received: January 11, 2021
- Accepted: July 16, 2021
- Accepted Manuscript published: July 19, 2021 (version 1)
- Version of Record published: August 4, 2021 (version 2)
- Version of Record updated: March 18, 2022 (version 3)
Copyright
© 2021, Kim et al.
This article is distributed under the terms of the Creative Commons Attribution License permitting unrestricted use and redistribution provided that the original author and source are credited.
Metrics
-
- 7,121
- Page views
-
- 897
- Downloads
-
- 45
- Citations
Article citation count generated by polling the highest count across the following sources: PubMed Central, Crossref, Scopus.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading
-
- Evolutionary Biology
5-Methylcytosine (5mC) and DNA methyltransferases (DNMTs) are broadly conserved in eukaryotes but are also frequently lost during evolution. The mammalian SNF2 family ATPase HELLS and its plant ortholog DDM1 are critical for maintaining 5mC. Mutations in HELLS, its activator CDCA7, and the de novo DNA methyltransferase DNMT3B, cause immunodeficiency-centromeric instability-facial anomalies (ICF) syndrome, a genetic disorder associated with the loss of DNA methylation. We here examine the coevolution of CDCA7, HELLS and DNMTs. While DNMT3, the maintenance DNA methyltransferase DNMT1, HELLS, and CDCA7 are all highly conserved in vertebrates and green plants, they are frequently co-lost in other evolutionary clades. The presence-absence patterns of these genes are not random; almost all CDCA7 harboring eukaryote species also have HELLS and DNMT1 (or another maintenance methyltransferase, DNMT5). Coevolution of presence-absence patterns (CoPAP) analysis in Ecdysozoa further indicates coevolutionary linkages among CDCA7, HELLS, DNMT1 and its activator UHRF1. We hypothesize that CDCA7 becomes dispensable in species that lost HELLS or DNA methylation, and/or the loss of CDCA7 triggers the replacement of DNA methylation by other chromatin regulation mechanisms. Our study suggests that a unique specialized role of CDCA7 in HELLS-dependent DNA methylation maintenance is broadly inherited from the last eukaryotic common ancestor.