Dynamic turnover of centromeres drives karyotype evolution in Drosophila
Figures

Phylogenetic relationships and karyotype evolution in the D. obscura group.
Drosophila subobscura represents the ancestral karyotype condition consisting of five large and one small pair of telocentric chromosomes (termed Muller elements A-F). Phylogeny adapted from Gao et al. (2007). Chromosomal fusions and movement of centromeres along the chromosomes has resulted in different karyotypes in different species groups (Segarra et al., 1995; Schaeffer et al., 2008). Indicated along the tree are transitions of chromosome morphology, and the different subgroups of the obscura species group are indicated by gray shading (the subobscura, obscura, pseudoobscura, and affinis subgroup). Muller elements are color coded, and centromeres are shown as black ovals.

Genome organization in Drosophila obscura group flies.
Shown here are the assembled chromosome sizes, scaffolding stitch points, gene density, repeat content (percentage of bases repeat-masked in 100 kb windows) and H3K9me3 enrichment (50 kb windows) across the genome assemblies of D. subobscura, D. athabasca, D. lowei, D. pseudoobscura and D. miranda. Muller elements are color coded, gene density is shown as a black to green heatmap (genes per 100 kb), H3K9me3 enrichment is shown in orange, and repeat density is shown in teal (note that H3K9me3 enrichment and repeat density are plotted semi-transparent). Scaffolding stich points are indicated as vertical lines.

Illumina coverage over assembled chromosomes in reference genome assemblies of D. subobscura, D pseudoobscura, D. lowei and D. athabasca.
Male (blue) and female (red) Illumina coverage (log2) shown in 5 kb non-overlapping windows along each chromosome. Scaffolding stitch point locations shown in the outermost track. Note that stitch points often coincide with aberrant patterns in Illumina coverage highlighting difficult regions to assemble.

Nanopore or PacBio coverage over reference genome assemblies of D. subobscura, D pseudoobscura, D. lowei and D. athabasca.
Coverage (log2) shown in 50 kb non-overlapping windows along the genome assembly with different genomic partitions highlighted. Contigs < 50 kb not shown. Note, D. lowei data was derived from males, D. subobscura and D. athabasca was derived from females, and D. pseudoobscura was derived from a combination of males and females. Therefore, coverage over A-AD is expected to be lower in D. lowei and D. pseudoobscura relative to the autosomes (Mullers B, C, E, F). Further, in D. pseudoobscura, A-AD and Y coverage is not expected to be similar due to differences in the female/male contribution to the total read pool.

Hi-C association heatmaps of genome assemblies for D. subobscura, D pseudoobscura, D. lowei and D. athabasca.
Lines demarcate assembled chromosomes and unplaced or Y contigs. Note that pericentromeric regions typically show weak Hi-C associations with neighboring euchromatic regions due to their repetitiveness, thus producing ‘checkerboard’ patterns in the plot. Also of note, off-diagonal associations are apparent in many euchromatic arms in our hybrid D. athabasca assembly and thus identify chromosomal inversions (see Materials and methods).

Chromosome synteny and evolution.
(A) Conservation of Muller elements in the Drosophila genus. Orthologous single copy Drosophila melanogaster (Dmel) BUSCOs plotted on reference genome assemblies. Muller elements are color-coded based on D. melanogaster. (B) Comparisons of synteny between our genome assemblies. Muller elements are color-coded based on the D. subobscura genome. Each line represents a protein-coding gene. Ovals denote the location of the putative centromere (based on the location of centromere-associated satellite sequences, see Figure 4).

Whole genome alignments (MUMmer) of our Drosophila subobscura genome assembly (strain 14011–0131.10) with (A) D.subobscura strain ch-cu and (B) D. guanche.
We show the Muller element naming scheme on the X axis and the subobscura group chromosome naming scheme on the Y axis.

Whole genome alignments (MUMmer) of draft Drosophila bifasciata genome contigs with (A) both Muller A and D of D. subobscura and (B) highlighting the Muller A centromere seed regions.
Many small contigs in the D. bifasciata assembly map to the D. subobscura seed region which is indicative of a highly repetitive and more poorly assembled region (i.e., pericentromere) in D. bifasciata. (C) Alignments of D. bifasciata contigs with the fused Muller A-AD of D. pseudoobscura show sequence similarity over the large metacentric pericentromere of Muller A-AD.

Identification of centromere-associated satellite sequences.
(A) Histograms of most abundant satellites in assembled genomes. Repeat length refer to the size of the repeat unit. For each species apart from D. subobscura, a specific satellite (or higher-order variant of it as indicated by the same colors) is enriched. In D. miranda, a 99mer (in green) and four units of a unrelated 21mer (84 bp; in red) are the most abundant satellites, in D. pseudoobscura, four units of a similar 21mer (84 bp; in red) is most common, in D. lowei, three units of a similar 21mer (63 bp; in red) is most common, in D. athabasca, an unrelated 160mer (in pink) is the most common satellite, and in the D. guanche genome (Puerma et al., 2018), an unrelated 290mer (in black) is most common. No abundant satellite was identified in the assembled genome of D. subobscura. (B) Location of putative centromere-associated repeats (from panel A) in pericentromeric regions. In D. suboobscura a 12mer is highly enriched in raw sequencing reads. Shown is a 5 Mb fragment for each chromosome with the highest density of the satellite sequence (that is the putative centromere), and all unplaced scaffolds. (C) FISH hybridization confirms centromere location of identified satellites (same color coding as in A and B). Probes corresponding to the 21mer (Cy5; red) and 99mer (Cy3; green) were hybridized to both D. miranda and D. pseudoobscura; the 21mer showed a centromere location in both species, while the 99mer hybridized only to the centromeres of D. miranda. The 160mer (6FAM; red) localized to the centromeres of D. athabasca, and the 12mer (TYE665; red) to the centromeres of D. subobscura. Stronger hybridization signal supposedly correspond to higher repeat abundance at a particular genomic location.

Genomic distribution of inferred centromeric satellite sequences.
https://doi.org/10.7554/eLife.49002.011
Short satellite DNAs in obscura group flies.
Shown is a heatmap of the results from k-Sseek analyses used to identify enriched satellite sequences. Only kmers >1 bp that constitute >10% of the total short satellite sequence in any one species are shown. White = not present, red = highly enriched.

Identification of centromere-associated satellite sequences from Nanopore and PacBio reads.
Shown are counts of satellite lengths identified directly from raw sequencing reads for each Drosophila species. Colored lines are drawn as in Figure 4 to highlight overlap between the results from TRF analyses and TideHunter analyses.

Additional fluorescent in situ hybridization images.
(A) To help visualize enrichment on all chromosomes in species with variable intensities, (A) shows only the color channel that identifies the 160mer and 12mer in D. athabasca and D. subobscura, respectively. Arrows show low intensity signal on Muller C in D. athabasca and an unknown Muller element in D. subobscura. (B) Shows replicates for each species and satellite placement during cell division.

Nanopore and Illumina sequencing coverage over unplaced contigs (Contig_6 and Contig_7) harboring arrays of putative D. subobscura centromeric-associated satellite sequence.
https://doi.org/10.7554/eLife.49002.015
Emergence and loss of centromeres.
(A) Shown are homologous genes between D. subobscura (telocentric), D. athabasca (metacentric) and D. pseudoobscura (metacentric and telocentric) with H3K9me3 enrichment plotted along Muller A (red), B (green) and E (purple) in 50 kb windows. Genes identified in the pericentromere of metacentric chromosomes are shown with black lines. Genes identified in pericentromeres of metacentric chromosomes can be traced to two ‘seed regions’ each on the telocentric chromosome of D. suboboscura, and to paleocentromere regions in species that secondarily lost the metacentric centromere. (B) GC-content across D. subobscura Muller A, B and E. Seed regions have significantly lower GC-content compared to genomic background (Supplementary file 7).

Alignments of Muller A between D. subobscura (telocentric) with D. athabasca (metacentric) and H3K9me3 enrichment plotted in 50 kb windows above each chromosome.
Pericentromeric genes in D. pseudoobscura shown in black.

GC-content, the percentage of bases repeat-masked, and number of genes, in 10 kb non-overlapping windows across Muller A, B and E.
Seed regions are shown bounded by dashed lines. Note the orientation for the Muller elements are shown as in Figure 5B.

GC-content of different functional categories in seed and non-seed regions of Muller A, B and E of D. subobscura.
https://doi.org/10.7554/eLife.49002.021
Karyotype and centromere evolution.
(A) Models for transitions between metacentric and telocentric chromosomes, either invoking pericentric inversions (top), or centromere repositioning (bottom) via the birth of a new centromere (lightning bolt) and death of the old centromere (skull and crossbones). The pericentromere is indicated by darker shading, the centromere as a white rectangle. (B) The syntenic location of genes adjacent to the centromere can allow us to distinguish between a simple inversion model vs. centromere relocation. The genes closest to the centromere of the telocentric chromosome (30 genes in panel C) are shown by different shading. (C) Dot plots for homologous genes (semi-transparent points) between telocentric and metacentric Muller elements (orange: Muller A-AD; purple: Muller E; green: Muller B). In 4 out of 5 cases, pericentric genes in the telocentric species are found in the non-pericentric regions of the metacentric species. Only Muller B between D. athabasca and pseudoobscura group flies (D. lowei is pictured) shows that the same genes are pericentric in both species (and thus support a simple inversion model).

Functional consequences of becoming pericentromeric.
(A) Metagene plots showing H3K9me3 enrichment for genes located in different parts of the genome in D. subobscura (top) and D. athabasca (bottom). (B) Patterns of gene expression for homologous genes in D. subobscura and D. athabasca, classified as whether they are part of the ‘seed’ region in D. subobscura that become part of the pericentromeric heterochromatin in D. athabasca or not. Expression patterns were not found to significantly differ between D. subobscura non-pericentromeric genes and seed genes, while seed orthologs located in the pericentromere of D. athabasca showed significantly higher expression than non-pericentromeric genes (Mann-Whitney U, p<0.0001).

Transposable element evolution across the genome.
Shown is the fraction of bases masked in 100 kb genomic windows for different transposable element families with the total TE fraction plotted above each chromosome.

Genomic distribution of transposable elements by species and Muller element.
The top 10 TE’s per Muller element, per species, are shown in descending order from top to bottom and ranked by their total contribution (bp) to each element. Each point represents a genomic location masked for an element.

De-novo estimates (dnaPipeTE) of transposable element frequencies in D. subobscura, D. athabasca, D. lowei and D. pseudoobscura.
https://doi.org/10.7554/eLife.49002.026
Bandage plot of a typical Drosophila genome assembly.
The left panel is a visualization of the genome graph (.gfa file) from a canu assembly with the node name for each contig and Illumina coverage displayed in text overtop each contig. Each contig in the assembly is shaded by the amount of male Illumina whole genome sequencing coverage (see Materials and methods). In this example, red contigs are likely autosomal (~40×) while darker contigs have less coverage and indicate either putative sex chromosome contigs (~20×) or putative contaminant contigs (<<20×). (B) Shown is a zoomed in image of 2 nodes (535 and 511) in the assembly with exceptionally low male (top) and female (bottom) Illumina coverage (<0.1×). By also visualizing the top BLAST hits for these contigs (not shown), we were able to identify these contigs as belonging to an Acetobacter species and were thus contaminants marked for removal from the assembly. Contigs with exceptionally high Illumina coverage were also scrutinized thoroughly but these can arise for multiple reasons, including mtDNA contigs, collapsed regions of the target genome (e.g., rDNA genes or centromeric satellite sequence), or non-target contaminant contigs.

BAC clone sequencing confirms centromere and pealeocentromere assembly.
Several independent BAC clones map to the assembly of our paleocentromeres in D. miranda.

Drosophila athabasca EB metacentric chromosome Hi-C associations and scaffolding.
Our EB assembly (Appendix 1—table 2) was superior to our EA assembly (Appendix 1—table 3) and long contigs from our EB assembly extended at least a megabase into the pericentromere for all metacentric chromosomes. Shown above are Hi-C association heatmaps from Juicebox (Durand et al., 2016a). Green boxes denote contigs. The pericentromeric region is highlighted in purple, and note the clear transition in Hi-C associations between euchromatic and heterochromatic regions. We used EA Hi-C data to scaffold the EB assembly. The EA and EB semispecies harbor inversions that differentiate the semispecies and we identified numerous inversions when mapping EA Hi-C to the EB genome assembly. Thus, the exceptionally long EB contigs that extend into the pericentromeric region allowed us to accurately scaffold chromosomes while simultaneously identifying inversions along the euchromatic arms.

Drosophila athabasca EA assembly Hi-C associations and scaffolding.
(A) Hi-C scaffolding of the EA assembly recovered long blocks of contigs we identified as Muller elements from our EB assembly. Blue boxes bound putative Muller element boundaries; green boxes denote contigs. Black arrows show contigs that span the euchromatic/heterochromatic transition and allow for confident scaffolding into pericentromeric regions. Blue arrows show regions where contigs failed to assemble across the transition making scaffolding based on Hi-C associations more challenging. (B) Shown is a zoomed in image of Muller C scaffolding. Here, a contig spans the euchromatic/heterochromatic transition on Muller C and we used this scaffolded in our Dath_EB_hybrid assembly. Note the lack of evidence for inversions in the Hi-C heatmaps since here we are using EA Hi-C with an EA assembly.

Whole genome alignment of our D. lowei assembly (Y axis) to the published Drosophila miranda genome.
https://doi.org/10.7554/eLife.49002.048
Whole genome alignment of our assembly (Y axis) to the published Drosophila pseudoobscura genome assembly (version 3.04).
Scaffolds in the published assembly that are near chromosome length (i.e., Muller E and Muller C) largely agree with our scaffolds. However, our assembly extends the assembled length of these chromosomes with far less scaffolding. For Muller B and Muller A-AD, our scaffolded chromosomes show large stretches of collinearity with the fragmented published assembly, with the exception of a few inverted regions. Our Hi-C data and association heatmap (see RESULTS) argue that our assembly orientation is likely the correct one and provide orientation to the five large scaffolds of Muller B and 8 scaffolds of Muller A-AD in the reference genome. The paleocentromeric region on Muller E is assembled in the current reference genome, but our assembly contains additional sequence not present in the published assembly (not shown).
Tables
Total length (bp) of each assembled Muller element in each species, the number of contigs, and estimated length (Mb) of pericentromere sequence.
https://doi.org/10.7554/eLife.49002.016Species | Muller A | Muller D | Muller A-AD | Muller B | Muller C | Muller E | Muller F | Total | |
---|---|---|---|---|---|---|---|---|---|
D. subobscura | |||||||||
chromosome (bp) | 24,182,865 | 23,815,339 | n/a | 25,941,769 | 20,343,353 | 30,159,154 | 1,505,893 | 125,948,373 | |
contigs | 9 | 8 | n/a | 5 | 1 | 3 | 4 | 30 | |
pericentromere (Mb) | 1.9 | 1.7 | n/a | 2.8 | 1.1 | 1.1 | n/a | 8.6 | |
D. athabasca | |||||||||
chromosome (bp) | n/a | n/a | 67,112,822 | 52,101,127 | 24,053,775 | 42,973,490 | 1,524,173 | 187,765,387 | |
contigs | n/a | n/a | 4 | 4 | 6 | 7 | 1 | 22 | |
pericentromere (Mb) | n/a | n/a | 14.1 | 22.8 | 2.5 | 11.2 | n/a | 50.6 | |
D. lowei | |||||||||
chromosome (bp) | n/a | n/a | 73,251,623 | 31,032,897 | 24,430,087 | 48,132,706 | 1,606,711 | 178,454,024 | |
contigs | n/a | n/a | 190 | 42 | 47 | 152 | 1 | 432 | |
pericentromere (Mb) | n/a | n/a | 17.2 | 2.2 | 3.5 | 15.1 | n/a | 38 | |
D. miranda | |||||||||
chromosome (bp) | n/a | n/a | 77,621,844 | 32,539,841 | 25,306,191 | 35,263,383 | 2,366,016 | 173,097,275 | |
contigs | n/a | n/a | 18 | 2 | 3 | 3 | 1 | 27 | |
pericentromere (Mb) | n/a | n/a | 20.5 | 3.4 | 3.4 | 2 | n/a | 29.3 | |
D. pseudoobscura | |||||||||
chromosome (bp) | n/a | n/a | 67,434,674 | 30,637,803 | 22,641,560 | 32,023,297 | 1,941,385 | 154,678,719 | |
contigs | n/a | n/a | 37 | 6 | 5 | 5 | 1 | 54 | |
pericentromere (Mb) | n/a | n/a | 14.1 | 2.8 | 2.7 | 0.7 | n/a | 20.3 |
Putative centromeric satellite lengths inferred from Tandem Repeat Finder (Benson, 1999), k-Seek (Wei et al., 2014), and TideHunter (Gao et al., 2019) for each Drosophila species.
HOR = higher order repeat.
Tandem Repeat Finder | k-Seek | TideHunter | |
---|---|---|---|
D. subobscura | no candidate | 12 bp | 12 bp, 107 bp |
D. athabasca | 160 bp | 11 bp | 11 bp, 160 bp |
D. lowei | 63 bp(HOR of 21 bp) | no candidate | 21 bp |
D. miranda | 99 bp, 84 bp(HOR of 21 bp) | no candidate | 99 bp, 84 bp(HOR of 21 bp) |
D. pseudoobscura | 84 bp(HOR of 21 bp) | no candidate | 168 bp(HOR of 21 bp) |
Transposable elements in the D. obscura species group.
https://doi.org/10.7554/eLife.49002.027Species | TE | Total bp masked | % of genome masked |
---|---|---|---|
D. subobscura | |||
total TE's | 7,572,806 | 6.0% | |
Dpse_Gypsy_6 | 1,319,782 | 1.0% | |
CR1-1_DPer | 449,484 | 0.4% | |
Gypsy8-I_Dpse | 424,330 | 0.3% | |
LOA-1_DPer | 331,166 | 0.3% | |
T213_X.Unknown | 323,480 | 0.3% | |
D. athabasca | |||
total TE's | 42,382,296 | 22.6% | |
Daff_Jockey_18 | 3,779,133 | 2.0% | |
CR1-1_DPer | 2,938,986 | 1.6% | |
T32_LTR | 1,958,710 | 1.0% | |
LOA-1_DPer | 1,729,016 | 0.9% | |
LOA-2_DPer | 1,683,407 | 0.9% | |
D. lowei | |||
total TE's | 45,307,006 | 25.4% | |
HelitronN-1_DPe | 2,764,830 | 1.5% | |
CR1-1_DPer | 2,564,981 | 1.4% | |
LOA-3_DPer | 1,421,120 | 0.8% | |
LOA-2_DPer | 1,019,792 | 0.6% | |
BEL-3_DPer-I | 965,011 | 0.5% | |
D. pseudoobscura | |||
total TE's | 29,907,407 | 19.3% | |
CR1-1_DPer | 2,082,339 | 1.3% | |
HelitronN-1_DPe | 1,962,822 | 1.3% | |
LOA-3_DPer | 1,024,994 | 0.7% | |
T154_X.Unknown | 799,007 | 0.5% | |
LOA-2_DPer | 726,556 | 0.5% | |
D. miranda | |||
total TE's | 42,680,234 | 24.7% | |
HelitronN-1_DPe | 3,834,913 | 2.2% | |
CR1-1_DPer | 3,148,617 | 1.8% | |
Gypsy18-I_Dpse | 2,261,130 | 1.3% | |
LOA-3_DPer | 1,557,989 | 0.9% | |
LOA-2_DPer | 1,208,941 | 0.7% |
Reagent type (species) or resource | Designation | Source or reference | Identifiers | Additional information |
---|---|---|---|---|
Strain, strain background (Drosophila subobscura male and female) | 14011–0131.10 | National Drosophila Species Stock Center (Cornell University) | stock center number: 14011–0131.10 | |
Strain, strain background (Drosophila pseudoobscura male and female) | MV2-25 | National Drosophila Species Stock Center (Cornell University) | stock center number: 14011–0121.94 | |
Biological sample (Drosophila lowei) | Jillo6 | isofemale line (deceased) | ||
Biological sample (Drosophila athabasca EA) | PA60 | isofemale line (deceased) | ||
Biological sample (Drosophila athabasca EB) | NJ28 | isofemale line (deceased) | ||
Commercial assay or kit | TruSeq Stranded RNA kit | Illumina | cat # 20020595 | |
Commercial assay or kit | TruSeq DNA Nano Prep kit | Illumina | cat # 20015965 | |
Commercial assay or kit | DNeasy Kit | Qiagen | cat # 69504 | |
Commercial assay or kit | Blood and Cell Culture DNA Midi Kit | Qiagen | cat # 13343 | |
Commercial assay or kit | Gentra Puregene Tissue Kit | Qiagen | cat # 158667 | |
Commercial assay or kit | Ligation sequencing kit | Nanopore | SQK-LSK108 | |
Commercial assay or kit | Rapid sequencing kit | Nanopore | SQK-RAD004 | |
Commercial assay or kit | Quick DNA plus Midi kit | Zymo | cat # D4075 | |
Commercial assay or kit | Ligation sequencing kit | Nanopore | SQK-LSK109 | |
Commercial assay or kit | SMARTer Universal Low Input DNA-seq kit | Takara/Rubicon Bio | R400676 | |
Chemical compound, drug | H3K9me3 antibody | Diagenode | ||
Software, algorithm | canu | Koren et al., 2017 | ||
Software, algorithm | MUMmer | Kurtz et al., 2004 | ||
Software, algorithm | WTDBG2 | Ruan and Li, 2019 | ||
Software, algorithm | minimap2 | Li, 2018 | ||
Software, algorithm | Bandage | Wick et al., 2015 | ||
Software, algorithm | BWA | Li and Durbin, 2009 | ||
Software, algorithm | SAMtools | Li et al., 2009 | ||
Software, algorithm | bedtools | Quinlan and Hall, 2010 | ||
Software, algorithm | QUIVER | Chin et al., 2013 | ||
Software, algorithm | PILON | Walker et al., 2014 | ||
Software, algorithm | RACON | Vaser et al., 2017 | ||
Software, algorithm | Juicebox | Durand et al., 2016a | ||
Software, algorithm | Juicer | Durand et al., 2016b | ||
Software, algorithm | 3D-DNA | Dudchenko et al., 2017 | ||
Software, algorithm | GATK UnifiedGenotyper | DePristo et al., 2011 | ||
Software, algorithm | REPdenovo | Chu et al., 2016 | ||
Software, algorithm | RepeatMasker | Smith et al., 2005 | ||
Software, algorithm | MAKER | Campbell et al., 2014 | ||
Software, algorithm | HiSat2 | Kim et al., 2015 | ||
Software, algorithm | StringTie | Pertea et al., 2015 | ||
Software, algorithm | BUSCO | Simão et al., 2015 | ||
Software, algorithm | Tandem Repeat Finder | Benson, 1999 | ||
Software, algorithm | k-Seek | Wei et al., 2014; https://github.com/weikevinhc/k-seek | ||
Software, algorithm | TideHunter | Gao et al., 2019 |
Summary statistics and BUSCO results from the genome assembly process of Drosophila subobscura.
https://doi.org/10.7554/eLife.49002.042Canu/WTDBG2 only | Canu/WTDBG2 + Racon (3x) | Canu/WTDBG2 + Racon (3x) + Pilon (1x) | Canu/WTDBG2 + Racon (3x) + Pilon (2x) | Final Hi-C scaffolded Dsub_1.0 | |
---|---|---|---|---|---|
N50 | 11,277,487 | 11,360,548 | 11,375,305 | 11,370,518 | * |
Max Contig | 25,648,096 | 25,831,582 | 25,849,837 | 25,836,392 | * |
Assembly Size | 128,396,075 | 129,296,034 | 129,434,338 | 129,376,892 | 126,232,139 |
Number of Contigs | 62 | 61 | 61 | 61 | * |
Complete BUSCOs | 987 | 1017 | 1057 | 1060 | 1062 |
Complete Single Copy BUSCOs | 984 | 1011 | 1047 | 1050 | 1054 |
Duplicated | 3 | 6 | 10 | 10 | 8 |
Fragmented | 46 | 29 | 2 | - | - |
Missing | 33 | 20 | 7 | 6 | 4 |
% BUSCOs complete | 92.6% | 95.4% | 99.2% | 99.4% | 99.6% |
Summary statistics and BUSCO results from genome assembly process of Drosophila athabasca EB
https://doi.org/10.7554/eLife.49002.043Canu + bandage | Canu + Quiver (2x) | Canu + Quiver (2x) + Pilon (2x) | Final Hi-C scaffolded Dath_EB_1.0 | Final Hi-C scaffolded Dath_EB_hybrid | |
---|---|---|---|---|---|
N50 | 14,480,452 | 15,318,533 | 15,319,690 | * | * |
Max Contig | 34,554,387 | 34,873,651 | 34,879,405 | * | * |
Assembly Size | 195,713,124 | 192,740,469 | 192,655,157 | 192,660,667 | 192,054,219 |
Number of Contigs | 199 | 133 | 133 | * | * |
Complete BUSCOs | 1043 | 1054 | 1057 | 1057 | 1060 |
Complete Single Copy BUSCOs | 1023 | 1046 | 1048 | 1048 | 1052 |
Duplicated | 20 | 8 | 9 | 9 | 8 |
Fragmented | 12 | 3 | 1 | 1 | 1 |
Missing | 11 | 9 | 8 | 8 | 5 |
% BUSCOs complete | 97.9% | 98.9% | 99.2% | 99.2% | 99.5% |
Summary statistics and BUSCO results from genome assembly process of Drosophila athabasca EA
https://doi.org/10.7554/eLife.49002.045Canu + bandage | Canu + Quiver (2x) + Pilon (2x) | Final Hi-C scaffolded Dath_EA_1.0 | |
---|---|---|---|
N50 | 5,537,664 | 5,538,275 | * |
Max Contig | 20,031,442 | 20,052,631 | * |
Assembly Size | 193,369,473 | 193,423,818 | 193,434,778 |
Number of Contigs | 348 | 348 | * |
Complete BUSCOs | 1041 | 1055 | 1055 |
Complete Single Copy BUSCOs | 1021 | 1041 | 1041 |
Duplicated | 20 | 14 | 14 |
Fragmented | 14 | 5 | 5 |
Missing | 11 | 6 | 6 |
% BUSCOs complete | 97.7% | 99.0% | 99.0% |
Summary statistics and BUSCO results from genome assembly process of Drosophila lowei.
https://doi.org/10.7554/eLife.49002.047Canu/WTDBG2 only | Canu/WTDBG2 + Racon (3x) | Canu/WTDBG2 + Racon (3x) + Pilon (1x) | Canu/WTDBG2 + Racon (3x) + Pilon (2x) | Final Hi-C scaffolded Dlow_1.0 | |
---|---|---|---|---|---|
N50 | 4,754,630 | 4,787,044 | 4,797,192 | 4,793,318 | * |
Max Contig | 20,816,730 | 20,907,879 | 20,946,520 | 20,929,419 | * |
Assembly Size | 192,748,718 | 191,586,436 | 191,816,863 | 191,620,915 | 184,313,494 |
Number of Contigs | 943 | 726 | 726 | 726 | * |
Complete BUSCOs | 936 | 975 | 1037 | 1039 | 1036 |
Complete Single Copy BUSCOs | 920 | 960 | 1016 | 1021 | 1022 |
Duplicated | 16 | 15 | 21 | 18 | 14 |
Fragmented | 67 | 39 | 3 | 1 | 1 |
Missing | 63 | 52 | 26 | 26 | 29 |
% BUSCOs complete | 87.8% | 91.5% | 97.3% | 97.5% | 97.2% |
Summary statistics and BUSCO results from the genome assembly process of Drosophila pseudoobscura
https://doi.org/10.7554/eLife.49002.049Canu + Racon (3x) | Canu + Racon (3x) + Pilon (1x) | Canu + Racon (3x) + Pilon (2x) | Final Hi-C scaffolded Dpse_1.0 | |
---|---|---|---|---|
N50 | 5,996,964 | 5,975,358 | 5,971,646 | * |
Max Contig | 20,387,169 | 20,334,965 | 20,319,488 | * |
Assembly Size | 194,744,540 | 194,172,171 | 193,935,436 | 193,980,066 |
Number of Contigs | 481 | 481 | 481 | * |
Complete BUSCOs | 974 | 1052 | 1057 | 1055 |
Complete Single Copy BUSCOs | 968 | 1043 | 1048 | 1048 |
Duplicated | 6 | 9 | 9 | 7 |
Fragmented | 59 | 3 | 2 | 3 |
Missing | 33 | 11 | 7 | 8 |
% BUSCOs complete | 91.4% | 98.6% | 99.2% | 99.0% |
Additional files
-
Supplementary file 1
DNA sequence data generated for this study.
- https://doi.org/10.7554/eLife.49002.028
-
Supplementary file 2
Drosophila strains used for genome assembly.
- https://doi.org/10.7554/eLife.49002.029
-
Supplementary file 3
BUSCO results for assembled genomes.
For D. athabasca, BUSCO scores for the EB assembly are shown.
- https://doi.org/10.7554/eLife.49002.030
-
Supplementary file 4
Number of protein coding gene models from MAKER annotations for each genome assembly and Muller element.
- https://doi.org/10.7554/eLife.49002.031
-
Supplementary file 5
Average percentage of bases repeat-masked in each pericentromere.
- https://doi.org/10.7554/eLife.49002.032
-
Supplementary file 6
Inferred centromeric satellite sequence and fluorescent in situ hybridization probes.
- https://doi.org/10.7554/eLife.49002.033
-
Supplementary file 7
Comparison of gene density, repeat density, and GC-content (%), between seed and non-seed regions of Muller A, B and E in D. subobscura.
P-values for comparisons, Mann-Whitney U.
- https://doi.org/10.7554/eLife.49002.034
-
Supplementary file 8
The most common repeat families in each species and amount of masked sequence (bp).
- https://doi.org/10.7554/eLife.49002.035
-
Supplementary file 9
The top 10 TE’s from each Muller element and amount of masked sequence (bp).
- https://doi.org/10.7554/eLife.49002.036
-
Supplementary file 10
Hi-C data summary.
- https://doi.org/10.7554/eLife.49002.037
-
Transparent reporting form
- https://doi.org/10.7554/eLife.49002.038