Tools and Resources

Highly contiguous assemblies of 101 drosophilid genomes

Stanford University, United States
University of North Carolina, United States
University of Washington and Seattle Children's Hospital, United States
UC Davis, United States
University of California, Davis, United States
Bangor University, United Kingdom
University of North Carolina, Chapel Hill, United States
University of California, Berkeley, United States
University of Washington, United States
Tokyo Metropolitan University, Japan
University of Belgrade, Serbia
National Institute of Republic of Serbia, Serbia
Institute for Biological Research, Serbia
Yunnan University, China
Hokkaido University, Japan
Sapporo College, Hokkaido University of Education, Japan
Ehime University, Japan
University of Kentucky, United States
Indiana University, United States
University of Würzburg, Germany
Academy of Sciences of the Czech Republic, Czech Republic
Stowers Institute for Medical Research, United States
The University of North Carolina at Chapel Hill, United States
University of Nevada, Las Vegas, United States

Jul 19, 2021

Open access
Copyright information

Abstract
Data availability
Article and author information
Metrics

Abstract

Over 100 years of studies in Drosophila melanogaster and related species in the genus Drosophila have facilitated key discoveries in genetics, genomics, and evolution. While high-quality genome assemblies exist for several species in this group, they only encompass a small fraction of the genus. Recent advances in long-read sequencing allow high-quality genome assemblies for tens or even hundreds of species to be efficiently generated. Here, we utilize Oxford Nanopore sequencing to build an open community resource of genome assemblies for 101 lines of 93 drosophilid species encompassing 14 species groups and 35 sub-groups. The genomes are highly contiguous and complete, with an average contig N50 of 10.5 Mb and greater than 97% BUSCO completeness in 97/101 assemblies. We show that Nanopore-based assemblies are highly accurate in coding regions, particularly with respect to coding insertions and deletions. These assemblies, along with a detailed laboratory protocol and assembly pipelines, are released as a public resource and will serve as a starting point for addressing broad questions of genetics, ecology, and evolution at the scale of hundreds of species.

Data availability

All sequencing data and assemblies generated by this study are deposited at NCBI SRA and GenBank under NCBI BioProject PRJNA675888. Accession numbers for all data used but not generated by this study are provided in the supporting files. Dockerfiles and scripts for reproducing pipelines and analyses are provided on GitHub (https://github.com/flyseq/drosophila_assembly_pipelines). A detailed wet lab protocol is provided at Protocols.io (https://dx.doi.org/10.17504/protocols.io.bdfqi3mw).

The following data sets were generated

1. Kim BY
2. Wang JR
(2020) Nanopore-based assembly of many drosophilid genomes
NCBI BioProject, PRJNA675888.

https://www.ncbi.nlm.nih.gov/bioproject/?term=prjna675888

The following previously published data sets were used

1. Miller
2. DE
(2018) Sequencing and assembly of 14 Drosophila species
NCBI BioProject, ID: 427774.

https://www.ncbi.nlm.nih.gov/bioproject/PRJNA427774
1. The Drosophila modENCODE Project
(2011) modENCODE Drosophila reference genome sequencing (fruit flies)
NCBI BioProject, ID: 62477.

https://www.ncbi.nlm.nih.gov/bioproject/63477
1. Yang
2. H
(2018) DNA-seq of sexed Drosophila grimshawi, Drosophila silvestris, and Drosophila heteroneura
NCBI BioProject, ID: 484408.

https://www.ncbi.nlm.nih.gov/bioproject/PRJNA484408
1. Bronski
2. M
(2019) Drosophila montium Species Group Genomes Project
NCBI BioProject, ID: 554346.

https://www.ncbi.nlm.nih.gov/bioproject/PRJNA554346
1. Rane
2. R
(2018) Invertebrate sample from Drosophila repleta
NCBI BioProject, ID: 476692.

https://www.ncbi.nlm.nih.gov/bioproject/476692
1. Turissini
2. D
(2017) Fly lines
NCBI BioProject, ID: 395473.

https://www.ncbi.nlm.nih.gov/bioproject/395473
1. National Institute of Genetics [Japan]
(2016) Genome sequences of 10 Drosophila species
NCBI BioProject, ID: 322011.

https://www.ncbi.nlm.nih.gov/bioproject/PRJDB4817
1. Ellison
2. C
(2019) Raw genomic sequencing data from 16 Drosophila species
NCBI BioProject, ID: 550077.

https://www.ncbi.nlm.nih.gov/bioproject/PRJNA550077

Article and author information

Author details

Bernard Y Kim

Department of Biology, Stanford University, Stanford, United States

For correspondence
bernardkim@stanford.edu

Competing interests
The authors declare that no competing interests exist.

"This ORCID iD identifies the author of this article:" 0000-0002-5025-1292
Jeremy Wang

Genetics, University of North Carolina, Chapel Hill, United States

Competing interests
The authors declare that no competing interests exist.

"This ORCID iD identifies the author of this article:" 0000-0002-0673-9418
Danny E Miller

Department of Pediatrics, Division of Genetic Medicine, University of Washington and Seattle Children's Hospital, Seattle, United States

Competing interests
The authors declare that no competing interests exist.
Olga Barmina

Department of Evolution and Ecology, UC Davis, Davis, United States

Competing interests
The authors declare that no competing interests exist.
Emily Kay Delaney

Ecology and Evolution, University of California, Davis, Davis, United States

Competing interests
The authors declare that no competing interests exist.

"This ORCID iD identifies the author of this article:" 0000-0003-3609-5702
Ammon Thompson

Department of Evolution and Ecology, UC Davis, Davis, United States

Competing interests
The authors declare that no competing interests exist.
Aaron A Comeault

School of Natural Sciences, Bangor University, Bangor, United Kingdom

Competing interests
The authors declare that no competing interests exist.

"This ORCID iD identifies the author of this article:" 0000-0003-3954-2416
David Peede

Biology Deparment, University of North Carolina, Chapel Hill, Chapel Hill, United States

Competing interests
The authors declare that no competing interests exist.

"This ORCID iD identifies the author of this article:" 0000-0002-4826-0464
Emmanuel R D'Agostino

Biology Deparment, University of North Carolina, Chapel Hill, Chapel Hill, United States

Competing interests
The authors declare that no competing interests exist.
Julianne Pelaez

Department of Integrative Biology, University of California, Berkeley, Berkeley, United States

Competing interests
The authors declare that no competing interests exist.
Jessica M Aguilar

Department of Integrative Biology, University of California, Berkeley, Berkeley, United States

Competing interests
The authors declare that no competing interests exist.
Diler Haji

Department of Integrative Biology, University of California, Berkeley, Berkeley, United States

Competing interests
The authors declare that no competing interests exist.
Teruyuki Matsunaga

Department of Integrative Biology, University of California, Berkeley, Berkeley, United States

Competing interests
The authors declare that no competing interests exist.

"This ORCID iD identifies the author of this article:" 0000-0002-6433-622X
Ellie Armstrong

Department of Biology, Stanford University, Stanford, United States

Competing interests
The authors declare that no competing interests exist.
Molly Zych

Molecular and Cellular Biology Program, University of Washington, Seattle, United States

Competing interests
The authors declare that no competing interests exist.
Yoshitaka Ogawa

Department of Biological Sciences, Tokyo Metropolitan University, Hachioji, Japan

Competing interests
The authors declare that no competing interests exist.
Marina Stamenković-Radak

Faculty of Biology, University of Belgrade, Belgrade, Serbia

Competing interests
The authors declare that no competing interests exist.
Mihailo Jelić

Faculty of Biology, University of Belgrade, Belgrade, Serbia

Competing interests
The authors declare that no competing interests exist.

"This ORCID iD identifies the author of this article:" 0000-0002-1637-0933
Marija Savić Veselinović

Faculty of Biology, University of Belgrade, Belgrade, Serbia

Competing interests
The authors declare that no competing interests exist.
Marija Tanasković

Institute for Biological Research Siniša Stanković"", National Institute of Republic of Serbia, Belgrade, Serbia

Competing interests
The authors declare that no competing interests exist.
Pavle Erić

Population Genetics and Ecogenotoxicology, Institute for Biological Research, Belgrade, Serbia

Competing interests
The authors declare that no competing interests exist.

"This ORCID iD identifies the author of this article:" 0000-0002-0053-1982
Jian-Jun Gao

School of Ecology and Environmental Science, Yunnan University, Kunming, China

Competing interests
The authors declare that no competing interests exist.
Takehiro K Katoh

School of Ecology and Environmental Science, Yunnan University, Kunming, China

Competing interests
The authors declare that no competing interests exist.
Masanori J Toda

Hokkaido University Museum, Hokkaido University, Sapporo, Japan

Competing interests
The authors declare that no competing interests exist.

"This ORCID iD identifies the author of this article:" 0000-0003-0158-1858
Hideaki Watabe

Biological Laboratory, Sapporo College, Hokkaido University of Education, Sapporo, Japan

Competing interests
The authors declare that no competing interests exist.
Masayoshi Watada

Graduate School of Science and Engineering, Ehime University, Matsuyama, Japan

Competing interests
The authors declare that no competing interests exist.
Jeremy S Davis

Department of Biology, University of Kentucky, Lexington, United States

Competing interests
The authors declare that no competing interests exist.
Leonie Moyle

Indiana University, Bloomington, United States

Competing interests
The authors declare that no competing interests exist.
Giulia Manoli

Neurobiology and Genetics, Theodor Boveri Institute, Biocentre, University of Würzburg, Würzburg, Germany

Competing interests
The authors declare that no competing interests exist.
Enrico Bertolini

Neurobiology and Genetics, Theodor Boveri Institute, Biocentre, University of Würzburg, Würzburg, Germany

Competing interests
The authors declare that no competing interests exist.
Vladimír Košťál

Institute of Entomology, Biology Centre, Academy of Sciences of the Czech Republic, Prague, Czech Republic

Competing interests
The authors declare that no competing interests exist.
R Scott Hawley

Stowers Institute for Medical Research, Kansas City, United States

Competing interests
The authors declare that no competing interests exist.
Aya Takahashi

Department of Biological Sciences, Tokyo Metropolitan University, Hachioji, Japan

Competing interests
The authors declare that no competing interests exist.

"This ORCID iD identifies the author of this article:" 0000-0002-8391-7417
Corbin D Jones

Biology Department, The University of North Carolina at Chapel Hill, Chapel Hill, United States

Competing interests
The authors declare that no competing interests exist.
Donald K Price

School of Life Sciences, University of Nevada, Las Vegas, Las Vegas, United States

Competing interests
The authors declare that no competing interests exist.
Noah K Whiteman

Department of Integrative Biology, University of California, Berkeley, Berkeley, United States

Competing interests
The authors declare that no competing interests exist.

"This ORCID iD identifies the author of this article:" 0000-0003-1448-4678
Artyom Kopp

Department of Evolution and Ecology, UC Davis, Davis, United States

Competing interests
The authors declare that no competing interests exist.

"This ORCID iD identifies the author of this article:" 0000-0001-5224-0741
Daniel R Matute

Biology Deparment, University of North Carolina, Chapel Hill, Chapel Hill, United States

Competing interests
The authors declare that no competing interests exist.
Dmitri A Petrov

Department of Biology, Stanford University, Stanford, United States

Competing interests
The authors declare that no competing interests exist.

"This ORCID iD identifies the author of this article:" 0000-0002-3664-9130

Funding

National Institute of General Medical Sciences (F32GM135998)

Bernard Y Kim

National Institute of General Medical Sciences (R35GM119816)

Noah K Whiteman

Uehara Memorial Foundation (201931028)

Teruyuki Matsunaga

Ministry of Education, Science and Technological Development of the Republic of Serbia (451-03-68/2020-14/200178)

Marina Stamenković-Radak
Mihailo Jelić
Marija Savić Veselinović

Ministry of Education, Science and Technological Development of the Republic of Serbia (451-03-68/2020-14/200007)

Marija Tanasković
Pavle Erić

National Natural Science Foundation of China (32060112)

Jian-Jun Gao

Japan Society for the Promotion of Science (JP18K06383)

Masayoshi Watada

European Union Horizon 2020 Research and Innovation Program (765937-CINCHRON)

Giulia Manoli
Enrico Bertolini

Czech Science Foundation (19-13381S)

Vladimír Košťál

Japan Society for the Promotion of Science (JP19H03276)

Aya Takahashi

National Science Foundation (1345247)

Donald K Price

National Institute of General Medical Sciences (R35GM118165)

Dmitri A Petrov

National Institute of Diabetes and Digestive and Kidney Diseases (K01DK119582)

Jeremy Wang

National Science Foundation (DEB-1457707)

Corbin D Jones

National Institute of General Medical Sciences (R01GM121750)

Daniel R Matute

National Institute of General Medical Sciences (R01GM125715)

Daniel R Matute

Google Cloud Platform Research Credits

Bernard Y Kim

Google Cloud Platform Research Credits

Jeremy Wang

National Institute of General Medical Sciences (R35GM122592)

Artyom Kopp

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Copyright

This article is distributed under the terms of the Creative Commons Attribution License permitting unrestricted use and redistribution provided that the original author and source are credited.