Over 100 years of studies in Drosophila melanogaster and related species in the genus Drosophila have facilitated key discoveries in genetics, genomics, and evolution. While high-quality genome assemblies exist for several species in this group, they only encompass a small fraction of the genus. Recent advances in long-read sequencing allow high-quality genome assemblies for tens or even hundreds of species to be efficiently generated. Here, we utilize Oxford Nanopore sequencing to build an open community resource of genome assemblies for 101 lines of 93 drosophilid species encompassing 14 species groups and 35 sub-groups. The genomes are highly contiguous and complete, with an average contig N50 of 10.5 Mb and greater than 97% BUSCO completeness in 97/101 assemblies. We show that Nanopore-based assemblies are highly accurate in coding regions, particularly with respect to coding insertions and deletions. These assemblies, along with a detailed laboratory protocol and assembly pipelines, are released as a public resource and will serve as a starting point for addressing broad questions of genetics, ecology, and evolution at the scale of hundreds of species.
All sequencing data and assemblies generated by this study are deposited at NCBI SRA and GenBank under NCBI BioProject PRJNA675888. Accession numbers for all data used but not generated by this study are provided in the supporting files. Dockerfiles and scripts for reproducing pipelines and analyses are provided on GitHub (https://github.com/flyseq/drosophila_assembly_pipelines). A detailed wet lab protocol is provided at Protocols.io (https://dx.doi.org/10.17504/protocols.io.bdfqi3mw).
Nanopore-based assembly of many drosophilid genomesNCBI BioProject, PRJNA675888.
Sequencing and assembly of 14 Drosophila speciesNCBI BioProject, ID: 427774.
modENCODE Drosophila reference genome sequencing (fruit flies)NCBI BioProject, ID: 62477.
DNA-seq of sexed Drosophila grimshawi, Drosophila silvestris, and Drosophila heteroneuraNCBI BioProject, ID: 484408.
Drosophila montium Species Group Genomes ProjectNCBI BioProject, ID: 554346.
Invertebrate sample from Drosophila repletaNCBI BioProject, ID: 476692.
Fly linesNCBI BioProject, ID: 395473.
Genome sequences of 10 Drosophila speciesNCBI BioProject, ID: 322011.
Raw genomic sequencing data from 16 Drosophila speciesNCBI BioProject, ID: 550077.
- Bernard Y Kim
- Noah K Whiteman
- Teruyuki Matsunaga
- Marina Stamenković-Radak
- Mihailo Jelić
- Marija Savić Veselinović
- Marija Tanasković
- Pavle Erić
- Jian-Jun Gao
- Masayoshi Watada
- Giulia Manoli
- Enrico Bertolini
- Vladimír Košťál
- Aya Takahashi
- Donald K Price
- Dmitri A Petrov
- Jeremy Wang
- Corbin D Jones
- Daniel R Matute
- Daniel R Matute
- Bernard Y Kim
- Jeremy Wang
- Artyom Kopp
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
- Graham Coop, University of California, Davis, United States
© 2021, Kim et al.
This article is distributed under the terms of the Creative Commons Attribution License permitting unrestricted use and redistribution provided that the original author and source are credited.
5-Methylcytosine (5mC) and DNA methyltransferases (DNMTs) are broadly conserved in eukaryotes but are also frequently lost during evolution. The mammalian SNF2 family ATPase HELLS and its plant ortholog DDM1 are critical for maintaining 5mC. Mutations in HELLS, its activator CDCA7, and the de novo DNA methyltransferase DNMT3B, cause immunodeficiency-centromeric instability-facial anomalies (ICF) syndrome, a genetic disorder associated with the loss of DNA methylation. We here examine the coevolution of CDCA7, HELLS and DNMTs. While DNMT3, the maintenance DNA methyltransferase DNMT1, HELLS, and CDCA7 are all highly conserved in vertebrates and green plants, they are frequently co-lost in other evolutionary clades. The presence-absence patterns of these genes are not random; almost all CDCA7 harboring eukaryote species also have HELLS and DNMT1 (or another maintenance methyltransferase, DNMT5). Coevolution of presence-absence patterns (CoPAP) analysis in Ecdysozoa further indicates coevolutionary linkages among CDCA7, HELLS, DNMT1 and its activator UHRF1. We hypothesize that CDCA7 becomes dispensable in species that lost HELLS or DNA methylation, and/or the loss of CDCA7 triggers the replacement of DNA methylation by other chromatin regulation mechanisms. Our study suggests that a unique specialized role of CDCA7 in HELLS-dependent DNA methylation maintenance is broadly inherited from the last eukaryotic common ancestor.