Unique structure and positive selection promote the rapid divergence of Drosophila Y chromosomes

  1. Ching-Ho Chang  Is a corresponding author
  2. Lauren E Gregory
  3. Kathleen E Gordon
  4. Colin D Meiklejohn
  5. Amanda M Larracuente  Is a corresponding author
  1. Department of Biology, University of Rochester, United States
  2. School of Biological Sciences, University of Nebraska-Lincoln, United States

Abstract

Y chromosomes across diverse species convergently evolve a gene-poor, heterochromatic organization enriched for duplicated genes, LTR retrotransposons, and satellite DNA. Sexual antagonism and a loss of recombination play major roles in the degeneration of young Y chromosomes. However, the processes shaping the evolution of mature, already degenerated Y chromosomes are less well-understood. Because Y chromosomes evolve rapidly, comparisons between closely related species are particularly useful. We generated de novo long-read assemblies complemented with cytological validation to reveal Y chromosome organization in three closely related species of the Drosophila simulans complex, which diverged only 250,000 years ago and share >98% sequence identity. We find these Y chromosomes are divergent in their organization and repetitive DNA composition and discover new Y-linked gene families whose evolution is driven by both positive selection and gene conversion. These Y chromosomes are also enriched for large deletions, suggesting that the repair of double-strand breaks on Y chromosomes may be biased toward microhomology-mediated end joining over canonical non-homologous end-joining. We propose that this repair mechanism contributes to the convergent evolution of Y chromosome organization across organisms.

Editor's evaluation

This manuscript by Chang et al. reports the evolutionary patterns of Y-chromosome evolution in Drosophila, providing perhaps the most comprehensive interspecific comparison of Y chromosomes available to date. They focus on four species of the melanogaster species subgroup and do extensive sequencing and assembly. The manuscript describes the pattern of divergence in these chromosomes, and uses comparative approaches to compare the drivers of evolution in flies and mammals. The authors suggest that the Y chromosome uses a different mechanism to repair double strand breaks than on autosomes. We were impressed by the novelty and rigor of the work as well as the overall presentation of the results.

https://doi.org/10.7554/eLife.75795.sa0

Introduction

Most sex chromosomes evolved from a pair of homologous gene-rich autosomes that acquired sex-determining factors and subsequently differentiated. Y chromosomes gradually lose most of their genes, while their X chromosome counterparts tend to retain the original autosomal complement of genes. This Y chromosome degeneration follows a suppression of recombination (Rice, 1987a), which limits the efficacy of natural selection, and causes the accumulation of deleterious mutations through Muller’s ratchet, background selection, and hitchhiking effects (Bachtrog, 2013; Charlesworth, 1978; Rice, 1987b; Charlesworth et al., 1995; Charlesworth and Charlesworth, 2000). As a consequence, many Y chromosomes present a seemingly hostile environment for genes, with their mutational burden, high repeat content, and abundant silent chromatin.

Genomic studies of Y chromosome evolution focus primarily on young sex chromosomes, addressing how the suppression of recombination promotes Y chromosome degeneration at both the epigenetic and genetic levels (Bachtrog, 2013; Bergero et al., 2015). Although sexually antagonistic selection is traditionally cited as the cause of recombination suppression on the Y chromosome, direct evidence for its role is still lacking (Bergero et al., 2019) and new models propose that regulatory evolution is the initial trigger for recombination suppression (Lenormand et al., 2020). Regardless of its role in initiating recombination suppression, on degenerating Y chromosomes, sexually antagonistic selection may accelerate Y-linked gene evolution to optimize male-specific functions. Indeed, Y-linked genes tend to have slightly higher rates of protein evolution than their orthologs on other chromosomes (Bachtrog, 2003; Singh et al., 2014). Higher rates of Y-linked gene evolution are driven by positive selection, relaxed selective constraints and male-biased mutation patterns, with most Y-linked genes evolving under at least some functional constraint (Singh et al., 2014). Although there is evidence suggesting that some Y chromosomes have experienced recent selective sweeps (Larracuente and Clark, 2013; Bachtrog, 2004), the relative importance of positive selection in shaping Y chromosome evolution remains unclear.

Y chromosomes harbor extensive structural divergence between species, in part through the acquisition of genes from other genomic regions (Soh et al., 2014; Rozen et al., 2003; Hughes and Page, 2015; Bachtrog et al., 2019; Tobler et al., 2017; Peichel et al., 2019; Brashear et al., 2018; Hall et al., 2016). However, the functions of most Y-linked genes are unknown (Tobler et al., 2017; Hall et al., 2016; Chang and Larracuente, 2019; Carvalho et al., 2015). Some Y-linked genes are duplicated and, in extreme cases, amplified into so-called ampliconic genes—gene families with tens to hundreds of highly similar sequences. Y chromosomes of both Drosophila and mammals have independently acquired and amplified gene families, which turnover rapidly between closely related species (Soh et al., 2014; Bachtrog et al., 2019; Brashear et al., 2018; Ellison and Bachtrog, 2019; Hughes et al., 2010; Mueller et al., 2008). Following Y-linked gene amplification, gene conversion between gene copies may enhance the efficacy of selection on Y-linked genes in the absence of crossing over (Rozen et al., 2003; Connallon and Clark, 2010).

Detailed analyses of old Y chromosomes have been restricted to a few species with reference-quality assemblies, for example, mouse and human. The challenges of cloning and assembling repeat-rich regions of the genome have stymied progress towards a complete understanding of Y chromosome evolution (Carlson and Brutlag, 1977; Lohe and Brutlag, 1987a; Lohe and Brutlag, 1987b). Recent advances in long-read sequencing make it feasible to assemble large parts of Y chromosomes (Peichel et al., 2019; Hall et al., 2016; Chang and Larracuente, 2019; Mahajan et al., 2018) enabling comparative studies of a majority of Y-linked sequences in closely related species.

Drosophila melanogaster and three related species in the D. simulans clade are ideally suited to study Y chromosome evolution. These Y chromosomes are functionally divergent, contribute to hybrid sterility (Araripe et al., 2016; Bayes and Malik, 2009; Johnson et al., 1992; Coyne, 1985), and at least four X-linked meiotic drive systems likely shape Y chromosome evolution in these species (Bozzetti et al., 1995; Courret et al., 2019; Tao et al., 2007; Tao et al., 2001; Helleu et al., 2019; Branco et al., 2013; Montchamp-Moreau et al., 2001; Meiklejohn et al., 2018). Previous genetic and transcriptomic studies suggest that Y chromosome variation can impact male fitness and gene regulation (Reijo et al., 1995; Vogt et al., 1996; Sun et al., 2000; Repping et al., 2003; Morgan and Pardo-Manuel de Villena, 2017; Lemos et al., 2010; Wang et al., 2018; Sackton et al., 2011). Since there is minimal nucleotide variation and divergence in Y-linked protein-coding sequences within and between these Drosophila species (Singh et al., 2014; Larracuente and Clark, 2013; Helleu et al., 2019), structural variation may be responsible for the majority of these effects. For example, 20–40% of D. melanogaster Y-linked regulatory variation (YRV) comes from differences in ribosomal DNA (rDNA) copy numbers (Zhou et al., 2012). The chromatin on Drosophila Y chromosomes has genome-wide effects on expression level and chromatin states (Brown and Bachtrog, 2017), but aside from the rDNA, the molecular basis of Y chromosome divergence and variation in these species remains elusive.

To study the factors and forces shaping the evolution of Y chromosome structure, we assembled the Y chromosomes of the three species in the D. simulans clade to reveal their structure and evolution and compared them to D. melanogaster. We find that the Y chromosomes of the D. simulans clade species have high duplication and gene conversion rates that, along with strong positive selection, shaped the evolution of two new ampliconic protein-coding gene families. We propose that, in addition to positive selection, sexual antagonism, and genetic conflict, differences in the usage of DNA repair pathways may give rise to the unique patterns of Y-linked mutations. Together these effects may drive the convergent evolution of Y chromosome structure across taxa.

Results

Improving Y chromosome assemblies using long-read assembly and fluorescence in situ hybridization (FISH)

Long reads have enabled the assembly of many repetitive genome regions but have had limited success in assembling Y chromosomes (Bachtrog et al., 2019; Peichel et al., 2019; Hall et al., 2016; Chang and Larracuente, 2019). To improve Y chromosome assemblies for comparative genomic analyses, we applied our heterochromatin-sensitive assembly pipeline (Chang and Larracuente, 2019) with long reads that we previously generated (Chakraborty et al., 2021) to de novo reassemble the Y chromosome from the three species in the Drosophila simulans clade. We also resequenced male genomes using PCR-free Illumina libraries to polish these assemblies. Our heterochromatin-enriched methods improve contiguity compared to previous D. simulans clade assemblies. We recovered all known exons of the 11 canonical Y-linked genes conserved across the melanogaster group, including 58 exons missed in previous assemblies (Supplementary file 1; Gepner and Hays, 1993; Bernardo Carvalho et al., 2009). Based on the median male-to-female coverage (Chang and Larracuente, 2019), we assigned 13.7–18.9 Mb of Y-linked sequences per species with N50 ranging from 0.6 to 1.2 Mb. The quality of these new D. simulans clade Y assemblies is comparable to D. melanogaster (Table 1; Chang and Larracuente, 2019). We evaluated our methods by comparing our assignments for every 10 kb window of assembled sequence to its known chromosomal location. Our assignments have 96, 98, and 99% sensitivity and 5, 0, and 3% false-positive rates in D. mauritiana, D. simulans, and D. sechellia, respectively (Supplementary file 2). We have lower confidence in our D. mauritiana assignments, because the male and female Illumina reads are from different library construction methods. Therefore, we applied an additional criterion only in D. mauritiana based on the female-to-male total mapped reads ratio ( < 0.1), which reduces the false-positive rate from 13% to 5% in regions with known chromosomal location (Supplementary file 2; Figure 1—figure supplement 1). We can detect potential misassemblies by looking for discordant assignments between 10 kb windows on the same contigs. Because we do not find any obviously discordant F/M ratios for any contigs, we make chromosome assignments based on median male-to-female coverage and the ratio of female-to-male total mapped reads across whole contigs. Based on these chromosome assignments, we find 40–44% lower PacBio coverage on Y than X chromosomes in all three species (Figure 1—figure supplement 2; see Appendix 1).

Table 1
Contiguity statistics for heterochromatin-enriched assemblies.
Y chromosome assembly# of contigsTotal lengthContigs N50
D. melanogaster*8014,578,684416,887
D. mauritiana5517,880,0691,628,994
D. simulans3813,717,0561,031,383
D. sechellia6314,899,148555,130
  1. *
  2. This paper.

The cytological organization of the D. simulans clade Y chromosomes is not well-described (Lemeunier and Ashburner, 1984; Roy et al., 2005; Berloco et al., 2005). Therefore, we generated new physical maps of the Y chromosomes by combining our assemblies with cytological data. We performed FISH on mitotic chromosomes using probes for 12 Y-linked sequences (Figure 1 and Figure 1—figure supplements 34; Supplementary file 3) to determine Y chromosome organization at the cytological level. We also determined the location of the centromeres using immunostaining with a Cenp-C antibody (Figure 1—figure supplement 4; Erhardt et al., 2008). These cytological data permit us to (1) validate our assemblies and (2) infer the overall organization of the Y chromosome by orienting our scaffolds on cytological maps. Of the 11 Y-linked genes, we successfully ordered 10, 11, and 7 genes on the cytological bands of D. simulans, D. mauritiana, and D. sechellia, respectively (Figure 1 and Figure 1—figure supplement 3). Although we cannot examine the detailed organization as a complete contiguous Y-linked sequence, we did not find any discordance between our scaffolds and cytological data. We find evidence for extensive Y chromosomal structural rearrangements, including changes in satellite distribution, gene order, and centromere position. These rearrangements are dramatic even among the D. simulans clade species, which diverged less than 250 KYA (Figure 1 and Figure 1—figure supplement 3). The Y chromosome centromere position appears to be the same as determined by Berloco et al. for different strains of D. simulans and D. mauritiana, but not for D. sechellia (Berloco et al., 2005). One explanation for this discrepancy could be between-strain variation in D. sechellia Y chromosome centromere location. Together, our new physical maps and assemblies provide both large and fine-scale resolution on Y chromosome organization in the D. simulans clade.

Figure 1 with 5 supplements see all
Y chromosome organization in D. melanogaster and the three D. simulans clade species.

Schematics of the cytogenetic maps note the locations of Y-linked genes in D. melanogaster and D. simulans clade species. The bars show the relative placement of the scaffolds on the cytological bands based on FISH results. The solid black and dotted bars represent the scaffolds with known and unknown orientation information, respectively. The light blue and orange bars represent two new Y-linked gene families, Lhk and CK2ßtes-Y in the D. simulans clade, respectively. The arrows indicate the orientation of the genes (blue- minus strand; red- plus strand). Yellow circles denote centromere locations (cen). The blocks connecting genes between species highlight the structural rearrangements between species (purple for same, and green for inverted, orientation).

Y-linked sequence and copy number divergence across three species

Although the D. simulans clade species diverged only recently, Y chromosome introgression between pairs of species disrupts male fertility and influences patterns of genome-wide gene expression (Araripe et al., 2016; Johnson et al., 1992). One candidate locus that may contribute to functional divergence and possibly hybrid lethality is the Y-linked rDNA (Zhou et al., 2012; Paredes et al., 2011). Y-linked rDNA, specifically 28 S rDNA, were lost in D. simulans and D. sechellia, but not in D. mauritiana (Roy et al., 2005; Lohe and Roberts, 2000; Lohe and Roberts, 1990). However, the intergenic spacer (IGS) repeats between rDNA genes, which are responsible for X-Y pairing in D. melanogaster males (McKee and Karpen, 1990), are retained on both sex chromosomes in all three species (Roy et al., 2005; Lohe and Roberts, 2000; Lohe and Roberts, 1990). Consistent with previous cytological studies (Roy et al., 2005; Lohe and Roberts, 2000; Lohe and Roberts, 1990), we find that D. simulans and D. sechellia lost most Y-linked 18 S and 28 S rDNA sequences (Figure 1—figure supplement 5). Our assemblies indicate that, despite this loss of the rRNA coding sequences, all three species still retain IGS repeats. However, we and others do not detect Y-linked IGS repeats at the cytological level in D. sechellia (Figure 1—figure supplements 34; Roy et al., 2005; Lohe and Roberts, 2000; Lohe and Roberts, 1990), suggesting that their abundance is below the level of detection by FISH in this species.

Structural variation at Y-linked genes may also contribute to functional variation and divergence in the D. simulans clade. Previous studies reported many duplications of canonical Y-linked genes in D. simulans (Helleu et al., 2019; Chakraborty et al., 2021; Kopp et al., 2006). We find that all three species have at least one intact copy of the 11 canonical Y-linked genes, but there is also extensive copy number variation in Y-linked exons across these species (Figure 2 and Figure 2—figure supplements 12; Supplementary file 1; Chakraborty, 2020). Using Illumina reads, we confirm the copy number variation in our assemblies and reveal some duplicated Y-linked exons, particularly in kl-3, WDY, and Ppr-Y, that are not assembled in D. sechellia (Figure 2—figure supplement 1). Some duplicates may be functional because they are expressed and have complete open reading frames, (e.g. ARY, Ppr-Y1, and Ppr-Y2). The D. simulans Y chromosome has four complete copies of ARY, all of which show similar expression levels from RNA-seq data (Figure 2B and Supplementary file 4), but two copies have inverted exons 1 and 2. D. sechellia also contains at least five duplicated copies of ARY, some of which also have the inverted exons 1 and 2, but the absence of RNA-seq data from testes of this species prevents inferences regarding whether all copies of ARY are expressed. However, most duplications include only a subset of exons, and in many cases, the duplicated exons are located on the periphery of the presumed functional gene copy (Figure 2B and Figure 2—figure supplement 2, Supplementary file 4). For example, both D. simulans and D. mauritiana have multiple copies of exons 8–12 located at the 3’ end of kl-2 (Figure 2B). In D. simulans, most of these extra exons have low to no expression, while in D. mauritiana, there appears to be a substantial expression from many of the duplicated terminal exons, as well as an internal duplication of exon 5. Although the duplications of Y-linked genes can influence their expression, especially for genes with short introns (e.g. ARY, Ppr-Y1 and Ppr-Y2), it is unclear what effects these duplicated exons have on the protein sequences of these fertility-essential genes.

Figure 2 with 3 supplements see all
Duplication of canonical Y-linked exons.

(A) Exon copy number is highly variable across the three D. simulans clade species and generally greater than in D. melanogaster. (B) Gene structure of kl-2 and ARY inferred from assemblies and RNA-seq data. Upper bars indicate exons that are colored and numbered, with their height showing average read depth from sequenced testes RNA (D. simulans and D. mauritiana only). Lower bars indicate exon positions on the assembly and position on the Y-axis indicates coding strand. Some of the duplicated exons are expressed. For short genes (e.g., ARY), the duplicates may be functional and influence protein expression level, unlike duplicated exons of long genes (e.g., kl-2).

All exon-intron junctions are conserved within full-length copies of the canonical Y-linked genes, but intron lengths vary between these species (Figure 3). The length of longer introns ( > 100 bp in any species) is more dynamic than that of short introns (Figure 3; Supplementary file 5). The dramatic size differences in most introns cannot be attributed to a single deletion or duplication (see ORY example in Figure 2—figure supplement 3). Some Y-linked genes contain mega-base sized introns (i.e., mega-introns) whose transcription manifests as cytologically visible lampbrush-like loops (Y-loops) in primary spermatocytes (Bonaccorsi et al., 1988; Bonaccorsi et al., 1990). While Y-loops are found across the Drosophila genus (Meyer, 1963; Piergentili, 2007), their potential functions are unknown (Fingerhut et al., 2019; Redhouse et al., 2011; Pisano et al., 1993; Piergentili et al., 2004; Piergentili and Mencarelli, 2008) and the genes/introns that produce Y-loops differs among species (Chang and Larracuente, 2017). D. melanogaster has three Y-loops transcribed from introns of ORY (ks-1 in previous literature), kl-3, and kl-5 (Bonaccorsi et al., 1988). Based on cytological evidence, D. simulans has three Y-loops, whereas D. mauritiana and D. sechellia only have two (Piergentili, 2007). Of all potential loop-producing introns, we find that only the kl-3 mega-intron is conserved in all four species and has the same intron structure and sequences (i.e. (AATAT)n repeats). While both kl-5 and ORY produce Y-loops with (AAGAC)n repeats in D. melanogaster, (AAGAC)n is missing from the genomes of the D. simulans clade species. This observation is supported by our assemblies, the Illumina raw reads (Supplementary file 6), and published FISH results (Jagannathan et al., 2017). In the D. simulans clade, the ORY introns do not carry any long tandem repeats. However, kl-5 has introns with (AATAT)n repeats that may form a Y-loop in the D. simulans clade species. These data suggest that, while mega-introns and Y-loops may be conserved features of spermatogenesis in Drosophila, they turn over at both the sequence and gene levels over short periods of evolutionary time (i.e. ~ 2 My between D. melanogaster and the D. simulans clade).

Evolution of intron lengths in canonical Y-linked genes.

The intron length in canonical Y-linked genes is different between D. melanogaster and the three D. simulans clade species. Orthologous introns are connected by dotted lines. Completely assembled introns are in blue and introns with gaps in the assembly are in red, and are therefore minimum intron lengths.

Consistent with previous studies (Tobler et al., 2017; Chakraborty et al., 2021), we identify high rates of gene duplication to the D. simulans clade Y chromosome from other chromosomes. We find 49 independent duplications to the Y chromosome in our heterochromatin-enriched assemblies (Figure 4; Supplementary file 7), including eight newly discovered duplications (Tobler et al., 2017; Chakraborty et al., 2021). Twenty-eight duplications are DNA-based, 13 are RNA-based, and the rest are unknown due to limited sequence information (Supplementary file 7). The rate of transposition to the Y chromosome is about three to four times higher in the D. simulans clade compared to D. melanogaster (Chang and Larracuente, 2019). We also infer that 17 duplicated genes were independently deleted from D. simulans clade Y chromosomes. Some of these Y-linked duplications, including Fdy, Mst77Y and pirate, are known to be functional and/or under purifying selection (Tobler et al., 2017; Krsticevic et al., 2015; Russell and Kaiser, 1993; Chen et al., 2021). However, based on transcriptomes from D. simulans and D. mauritiana testes, we suspect that more than half of the duplicated genes are likely pseudogenes that either show no expression in testes ( < 3 TPM) or lack open reading frames ( < 100 amino acids; Supplementary file 7). We also detect intrachromosomal duplications of these Y-linked pseudogenes (Supplementary file 7), suggesting a high duplication rate within these Y chromosomes.

Turnover of new duplications to Y chromosomes in D. melanogaster and three species in the D. simulans clade.

Using phylogenetic analyses, we inferred the evolutionary histories of new Y-linked duplications. The blue and green numbers represent the number of independent duplications and deletions observed in each branch, respectively. We also detected four duplications presented in the ancestor of these four species. The deletion events that happened in the ancestor of these four species cannot be inferred without a Y chromosome assembly in the outgroup.

Most new Y-linked duplications in D. melanogaster and the D. simulans clade are from genes with presumed functions in chromatin modification, cell division, and sexual reproduction (Supplementary file 8), consistent with other Drosophila species (Bachtrog et al., 2019; Mahajan and Bachtrog, 2017). While Y-linked duplicates of genes with these functions may be selectively beneficial, a duplication bias could also contribute to this enrichment, as genes expressed in the testes may be more likely to duplicate to the Y chromosome due to its open chromatin structure and transcriptional activity during spermatogenesis (Greil and Ahmad, 2012; Mahadevaraju et al., 2021; Hess and Meyer, 1968).

The evolution of new Y-linked gene families

Ampliconic gene families are found on Y chromosomes in multiple Drosophila species (Ellison and Bachtrog, 2019). We discovered two new gene families that have undergone extensive amplification on D. simulans clade Y chromosomes (Figure 5). Both families appear to encode functional protein-coding genes with complete open reading frames and high expression in mRNA-seq data (Supplementary file 9) and have 36–146 copies in each species’ Y chromosome. We also confirm that >90% of the variants in our assembled Y-linked gene families are represented in Illumina DNA-seq data (Appendix 1).

The history of Y-linked ampliconic genes.

(A) Schematic showing the inferred evolutionary history of SRPK-Y. SRPK duplicated to the ancestral Y chromosome in the D. simulans clade. The Y-linked copy (Lhk) retained an exon with testis-specific expression, which was lost in the parental copy on 2R. The Y-linked copy (Lhk) further duplicated and increased their expression in testes. (B) Schematic showing the inferred evolutionary history of sex-linked Ssl/CK2ßtes paralogs. In the D. melanogaster – D. simulans clade ancestor, the autosomal gene Ssl/CK2ßtes duplicated from chromosome 2R to the sex chromosome and independently amplified into the multi-copy gene families CK2ßtes-like on the X chromosome and CK2ßtes-Y on the Y chromosomes (shaded orange box). The gene structures are maintained in the D. simulans clade species, but not in D. melanogaster. In the D. melanogaster lineage (shaded yellow box), CK2ßtes-Ys became pseudogenes (PCKR) and CK2ßtes-like acquired a promoter from ßNASCtes to create a chimeric gene. Subsequent duplication of the chimeric gene to the X chromosome gave rise to the X-linked Ste loci in D. melanogaster. Duplication of the chimeric gene to the Y chromosome, with a subsequent TE insertion in the promoter and amplification event, gave rise to the Y-linked Su(Ste) loci in D. melanogaster.

The first amplified Y-linked gene family, SR Protein Kinase (SRPK), is derived from an autosome-to-Y duplication of the sequence encoding the testis-specific isoform of the gene SR Protein Kinase (SRPK). After the duplication of SRPK to the Y chromosome, the ancestral autosomal copy subsequently lost its testis-specific exon via a deletion (Figure 5A). The movement of the male-specific isoform inspired us to name the Y-linked SRPK gene family Lo-han-kha (Lhk), which is the Taiwanese term for the male vagabonds that moved from mainland China to Taiwan during the Qing dynasty. In D. melanogaster, SRPK is essential for both male and female reproduction (Loh et al., 2012). We therefore hypothesize that the relocation of the testis-specific isoform to the D. simulans clade Y chromosomes may have resolved intralocus sexual antagonism over these two functions. Our phylogenetic analysis identified two subfamilies of Lhk that we designate Lhk-1 and Lhk-2 (Figure 6A). Both subfamilies are shared by all D. simulans clade species and show a 5.5% protein divergence between species. The two subfamilies are found in different locations in our Y chromosome assemblies; consistent with this observation, we detect two to three Lhk foci on Y chromosomes in the D. simulans clade using FISH (Figure 6A and C and Figure 1—figure supplements 34).

Figure 6 with 3 supplements see all
The rapid evolution and gene conversion of Y-linked ampliconic genes.

(A) The inferred maximum likelihood phylogeny for Lhk. Node labels indicate SH-aLRT and ultrafast bootstrap (e.g. 100/100) or rates of protein evolution from PAML with CodonFreq = 0,1, or 2 (e.g. 1.01/1.02/1.03) (Figure 6—figure supplement 1 and Figure 6—figure supplement 3). Lhk shows evidence for positive selection (branch tests and branch-site tests with ω>1) after the duplication from 2R (SRPK) to the Y chromosome in the D. simulans clade. One Lhk subfamily (Lhk-1) is under recent purifying selection and is located close to the centromere, but the other (Lhk-2) is rapidly evolving across the species of the D. simulans clade. (B) Same as A but for CK2ßtes-Y. Both Y-linked CK2ßtes-Y and X-linked CK2ßtes-like also show positive selection. All ω values shown are statistically significant (LRT tests, P0.05; Supplementary file 12 and Supplementary file 14). (C) Cytological location of Y-linked gene families detected using Immunolabeling with fluorescence in situ hybridization (immunoFISH) for the centromere (CENP-C antibody, red signal). On the Y chromosomes, Lhk FISH signals suggest that this gene family occurs in 2–3 cytological locations (green signal), with one near the centromere. CK2ßtes-Y FISH signals are only located near centromeres. Based on our analysis of sequence information, we suggest that most Lhk-1 copies are located near CK2ßtes-Y and the centromere.

The second amplified gene family comprises both X-linked and Y-linked duplicates of the Ssl gene located on chromosome 2 R; it is unclear whether the X- or Y-linked copies originated first (Figure 5B). The X-linked copies are known as CK2ßtes-like in D. simulans (Kogan et al., 2012). The Y-linked copies are also found in D. melanogaster but are degenerated and have little or no expression (Chang and Larracuente, 2019; Danilevskaya et al., 1991), leading to their designation as pseudogenes. In the D. simulans clade species, however, the Y-linked paralogs have high levels of expression ( > 50 TPM in testes, Supplementary file 9) and complete open reading frames, so we refer to this gene family as CK2ßtes-Y. Both CK2ßtes-like (4–9 copies) and CK2ßtes-Y (36–123 copies based on the assemblies) are amplified on the X and Y chromosome in the D. simulans clade relative to D. melanogaster (Supplementary file 9; Kogan et al., 2012). The Y-linked copies in D. melanogaster, Su(Ste), are known to be a source of piRNAs (Aravin et al., 2004). We did not detect any testis piRNAs from either gene family in two small RNA-seq datasets (SRR7410589 and SRR7410590); however, we do find some short ( < 23 nt) reads (0.003–0.005% of total mapped reads) mapped to these gene families (Supplementary file 10).

We inferred gene conversion rates and the strength of selection on these Y-linked gene families using phylogenetic analyses on coding sequences. We estimated the gene conversion rate in D. simulans clade Y-linked gene families based on four-gamete tests and gene similarity (Rozen et al., 2003; Chang and Larracuente, 2019; Ohta, 1984; Backström et al., 2005). In general, D. simulans clade species show similar gene conversion rates (on the order of 10–4 to 10–6) in both of these families compared to our previous estimates in D. melanogaster (Chang and Larracuente, 2019; Supplementary file 11). These higher gene conversion rates compared to the other chromosomes might be a shared feature of Y chromosomes across taxa (Rozen et al., 2003).

To estimate rates of molecular evolution, we conducted branch-model and branch-site-model tests on the reconstructed ancestral sequences of Lhk-1, Lhk-2, CK2ßtes-Y, and two CK2ßtes-like using PAML (Figure 6A and B; Table 2; Yang, 1997). We used reconstructed ancestral sequences for our analyses to avoid sequencing errors in the assemblies, which appear as singletons. We infer that after the divergence of D. simulans clade species, Lhk-1 evolved under purifying selection, whereas Lhk-2 evolved under positive selection (Figure 6A; Table 2; Figure 6—figure supplement 1; Supplementary file 12). Using transcriptome data, we observe that highly expressed Lhk-1 copies have fewer nonsynonymous mutations than lowly expressed copies in D. simulans, consistent with purifying selection (Chi-square test’s p = 0.01; Figure 6—figure supplement 2 and Supplementary file 13). Both Lhk gene families are expressed two- to seven-fold higher than the ancestral copy on 2R in the same species, and 1.9–64-fold higher than their ortholog, SRPK-RC, in D. melanogaster, suggesting that gene amplification may confer increased expression. In both D. simulans and D. mauritiana, Lhk-1 is shorter due to deletions following its origin and has a higher expression level than Lhk-2. Both Lhk gene families have higher copy numbers in D. simulans than D. mauritiana, which likely contributes to their higher expression level in D. simulans (Supplementary file 9). For both Lhk-1 and Lhk-2, copies from the same species are more similar than copies from other species—a signal of concerted evolution (Dover, 1982).

Table 2
PAML analyses reveal positive selection on Y-linked ampliconic gene families.
LhkBranch test with CodonFreq = 0Branch-site test site class
ω1ω2ω3L2∆lnLLRT’s Pω0ω1ω2aω2b2∆lnLLRT’s PPositively selected sites (BEB > 0.95)*
one ω0.17–3250.74
two ω0.111.05–3218.2664.947.71E-160.0114.874.8713.043.05E-04I4, H11, V32, V75, N99, Y100, D193, D199
three ω0.111.490.43–3216.303.920.05
CK2ßtes
one ω0.35–3295.01
two ω§0.251.05–3272.0046.011.18E-110.0512.212.216.541.06E-02D33, T38, K44, K100, F101, K104, M152, M155
three ω0.200.421.05–3266.3311.357.56E-04
  1. *

    See Supplementary files 12 and 14 for all sites.

  2. Autosomal and Y lineage have protein evolution of ω1 and ω2, respectively.

  3. §

    Autosomal and sex chromosomal (X and Y) have protein evolution of ω1 and ω2, respectively.

The ancestral Ssl gene experienced a slightly increased rate of protein evolution after it duplicated to the X and Y chromosomes (ω = 0.41 vs 0.23; p = 0.03; Figure 6B; Table 2; Figure 6—figure supplement 3; Supplementary file 14). We find that both CK2ßtes-like and CK2ßtes-Y share strong signals of positive selection, based on branch-model and branch-site-model tests (p = 8.8E-9; Figure 6B; Table 2; Figure 6—figure supplement 3; Supplementary file 14). In D. melanogaster, the overexpression of the CK2ßtes-like X-linked homolog, Stellate, can drive in the male germline by killing Y-bearing sperm and generating female-biased offspring (Malone et al., 2015; Palumbo et al., 1994; Meyer et al., 2004). We suspect that CK2ßtes-like and CK2ßtes-Y might have similar functions and may also have a history of conflict. Therefore, the co-amplification of sex-linked genes and positive selection on their coding sequences may be a consequence of an arms race between sex chromosome drivers.

Y chromosome evolution driven by specific mutation patterns

The specific DNA-repair mechanisms used on Y chromosomes might contribute to their high rates of intrachromosomal duplication and structural rearrangements. Because Y chromosomes lack a homolog, they must repair double-strand breaks (DSBs) by non-homologous end joining (NHEJ) or microhomology-mediated end joining (MMEJ), which relies on short homology (usually > 2 bp) to repair DSBs (Chan et al., 2010). Compared to NHEJ, MMEJ is more error-prone and can result in translocations and duplications (McVey and Lee, 2008). Preferential use of MMEJ instead of NHEJ could contribute to the high duplication rate and extensive genome rearrangements that we observe on Y chromosomes. To infer the mechanisms of DSB repair on Y chromosomes, we counted indels between Y-linked duplicates and their parent genes for a set of 21 putative pseudogenes. Both NHEJ and MMEJ can generate indels, but NHEJ usually produces smaller indels (1–3 bp) compared to MMEJ ( > 3 bp) (McVey and Lee, 2008; Chang et al., 2017). We also cataloged short stretches of homology between each duplicate and its parent. To compare Y-linked patterns of DSB repair to other regions of the genome, we measured the size of polymorphic indels in intergenic regions and pseudogenes on the autosomes and X chromosomes from population data in D. melanogaster (DGRP; Huang et al., 2014) and D. simulans (Signor et al., 2018). To the extent that these indels do not experience selection, their sizes should reflect the mutation patterns on each chromosome. We observe proportionally more large deletions on Y chromosomes (25.1% of Y-linked indels are ≥10 bp deletions; Supplementary file 15) compared to other chromosomes in both D. melanogaster (12.8% and 15.2% of indels are ≥10 bp deletions in intergenic regions and pseudogenes) and D. simulans (7.3% of indels are ≥10 bp deletions in intergenic regions; all pairwise chi-square’s p< 1e-6; Figure 4A; Supplementary file 15). The pattern of excess large deletions is shared in the three D. simulans clade species Y chromosomes but is not obvious in D. melanogaster (Figure 7B). However, because most (36/41) D. melanogaster Y-linked indels in our analyses are from copies of a single pseudogene (CR43975), it is difficult to compare to the larger samples in the simulans clade species (duplicates from 17 genes). The differences in deletion sizes between the Y and other chromosomes are unlikely to be driven by heterochromatin or the lack of recombination. The non-recombining and heterochromatic dot chromosome has a deletion size profile more similar to the other autosomes in D. simulans (10.9% of indels are ≥10 bp deletions), consistent with a previous study using TE sequences across different chromatin domains (Blumenstiel et al., 2002). We also found fewer large deletions (2/149 indels are ≥10 bp in 400 kb alignments; Figure 7A) in heterochromatic pseudogenes using 19 long-read (Pacbio or nanopore) assemblies. The enrichment of 1 bp indels (101/149; Figure 7A) in heterochromatic pseudogenes might represent sequencing errors in long-read assemblies (Weirather et al., 2017). These results suggest that Y chromosomes may use MMEJ over NHEJ compared to other chromosomes, particularly in the simulans clade species. We also find that across the genome, larger deletions ( > 7 bp) share a similar length of microhomologies for repairing DSBs (39.5–57% deletions have ≥2 bp microhomology; Chi-square test for microhomology length between Y and other chromosomes, p > 0.24; Supplementary files 15–16), consistent with most being a consequence of MMEJ-mediated repair.

Figure 7 with 3 supplements see all
An excess of large deletions on Y chromosomes compared to population data suggests a preference for MMEJ.

(A) We compared the size of 223 indels on 21 recently duplicated Y-linked genes in D. melanogaster and the D. simulans clade species to the indels polymorphic in the D. melanogaster and D. simulans populations. For the indels in D. melanogaster and D. simulans populations, we separated them based on their location, including autosomes (excluding dot chromosomes), X chromosomes, and dot chromosomes. We excluded the D. melanogaster dot-linked indels due to the small sample size (12). We also surveyed indel polymorphism in pseudogenes in D. melanogaster using population data. (B) We classify Y-linked indels by whether they are shared between species or specific in one species (C) The excess of large deletions (underlined) on the Y chromosomes is consistent with MMEJ between short regions of microhomology (red).

The satellite sequence composition of Y chromosomes differs between species (Jagannathan et al., 2017; Wei et al., 2018; Cechova et al., 2019). A high duplication rate may accelerate the birth and turnover of Y-linked satellite sequences. We discovered five new Y-linked satellites in our assemblies and validated their location using FISH (Figure 1—figure supplements 34 and Supplementary file 6). These satellites only span a few kilobases of sequences (5,515–26,119 bp) and are homogenized. According to its flanking sequence, one new satellite, (AAACAT)n, originated from a DM412B transposable element, which has three tandem copies of AAACAT in its long terminal repeats. The AAACAT repeats expanded to 764 copies on the Y chromosome specifically in D. mauritiana. This is consistent with other reports of novel satellites arising from TEs (Dias et al., 2014). The other four novel satellites are flanked by transposons ( < 50 bp) and may derive from non-repetitive sequences. The MMEJ pathway may contribute to the birth of new repeats, as this mechanism is known to generate tandem duplications via template-switching during repair (McVey and Lee, 2008). Short-tandem repeats can be further amplified via saltatory replication or unequal crossing-over between sister chromatids.

Consistent with findings in other species (Peichel et al., 2019; Chang and Larracuente, 2019), we find an enrichment of LTR retrotransposons on the D. simulans clade Y chromosomes relative to the rest of the genome (Supplementary file 17). Interestingly, we find that the Y-linked LTR retrotransposons also turn over between species (Figure 7—figure supplement 1 and Supplementary file 18). We find a positive correlation between the difference in Y-linked TE abundance between D. melanogaster and each of the D. simulans clade species versus the rest of the genome (rho = 0.45–0.50; Figure 7—figure supplement 2 and Supplementary file 18). This suggests that global changes in transposon activity could explain the differences in Y-linked TEs abundance between species. However, the correlations between species within the D. simulans clade are weaker (rho < 0.23; Figure 7—figure supplement 2 and Supplementary file 18), consistent with the possibility that some TEs may shift their insertion preference between chromosomes. To test this hypothesis, we estimated the ages of LTR retrotransposons by their length. We find that the recent insertions of LTR transposons are differently distributed across chromosomes between species (Figure 7—figure supplement 3), suggesting that insertion preferences towards genomic regions may differ for some TEs. For example, we detect many recent DIVER element insertions on the Y chromosome in D. simulans, but not in D. sechellia (Figure 7—figure supplement 3).

Discussion

Despite their independent origins, the degenerated Y chromosomes of mammals, fish, and insects have convergently evolved structural features of gene acquisition and amplification, accumulation of repetitive sequences, and gene conversion. Here, we consider the mutational processes that contribute to this structure and its consequences for Y chromosome biology. Our assemblies revealed extensive Y chromosome rearrangements between three very closely related Drosophila species (Figure 1). These rearrangements may be the consequence of rejoining telomeres after DSBs, as telomere-specific sequences are embedded in non-telomeric regions of Drosophila Y chromosomes (Berloco et al., 2005; Abad et al., 2004; Agudo et al., 1999). We propose that four pieces of evidence suggest DSBs on Y chromosomes may be preferentially repaired using the MMEJ pathway. First, Y-linked sequences are generally absent from the X chromosome, precluding repair of DSBs by homologous recombination in meiosis. Second, NHEJ on Y chromosomes may be limited because the Ku complex, which is required for NHEJ (Chang et al., 2017), is excluded from HP1a-rich regions of chromosomes (Chiolo et al., 2011). The Ku complex also binds telomeres and might prevent telomere fusions (Melnikova et al., 2005; Samper et al., 2000), suggesting that a low concentration of Ku on Y chromosomes could also cause high rates of telomere rejoining. Third, the highly repetitive nature of Y chromosomes may increase the rate of DSB formation, which may also contribute to a higher rate of MMEJ (McVey and Lee, 2008; Katsura et al., 2007). Fourth, we show that Y chromosomes have high duplication and gene conversion rates, and larger deletion sizes than other genomic regions (Figure 7), consistent with a preference for MMEJ to repair Y-linked DSBs (McVey and Lee, 2008).

The exclusion of the Ku complex from heterochromatin could also contribute to an excess of Y-linked duplications we observe in the D. simulans clade relative to D. melanogaster (Figures 2A and 7). D. simulans clade Y chromosomes might harbor relatively more heterochromatin than the D. melanogaster Y due to the partial loss of their euchromatic rDNA repeats (Roy et al., 2005; Lohe and Roberts, 2000; Lohe and Roberts, 1990), and D. simulans also expresses more heterochromatin-modifying factors, such as Su(var)s and E(var)s (Lee and Karpen, 2017), compared to D. melanogaster. To explore these hypotheses, the distribution of the Ku complex across chromosomes in the testes of these species should be studied.

If MMEJ is preferentially used to fix DSBs on the Y chromosome, we might expect that the mutations in the MMEJ pathway would disproportionately impact Y-bearing sperm. Consistent with this prediction, a previous study showed that male D. melanogaster with a deficient MMEJ pathway (DNApol theta mutants) sire female-biased offspring (McKee et al., 2000). Moreover, sperm without sex chromosomes that result from X-Y non-disjunction events are not as strongly affected by an MMEJ deficiency as Y-bearing sperm (McKee et al., 2000), suggesting that sperm with Y chromosomes are more sensitive to defects in MMEJ.

Drosophila Y chromosomes can act as heterochromatin sinks, sequestering heterochromatin marks from pericentromeric regions and suppressing position-effect variegation (Brown and Bachtrog, 2017; Dimitri and Pisano, 1989; Henikoff, 1996; Gatti and Pimpinelli, 1992). Therefore, retrotransposons located in heterochromatin might have higher activities in males due to the presence of Y-linked heterochromatin (Brown and Bachtrog, 2017; Henikoff, 1996), although the genomic distribution of heterochromatin during spermatogenesis is unknown. We find that, like D. melanogaster (Chang and Larracuente, 2019), D. simulans clade Y chromosomes are enriched in retrotransposons relative to the rest of the genome; however, Y chromosomes from even the closely related D. simulans clade species harbor distinct retrotransposons (Figure 7—figure supplement 1 and Supplementary file 18), indicating that some TEs may have rapidly shifted their insertion preference. This preference might benefit the TEs because Y-linked TEs might be expressed during spermatogenesis (Lawlor et al., 2021). On the other hand, Y chromosomes can be a significant source of small RNAs that silence repetitive elements during spermatogenesis—for example, Su(Ste) piRNAs in D. melanogaster (Quénerch’du et al., 2016; Aravin et al., 2001) —and thus may also contribute to TE suppression. If Y chromosomes contribute to piRNA or siRNA production (e.g. have piRNA clusters Chen et al., 2021; Aravin et al., 2001), then the TE insertion preference for the Y chromosome may sometimes be beneficial for the host, as they could provide immunity against active TEs in males. In this sense, Y chromosomes may even act as “TE traps” that incidentally suppress TE activity in the male germline by producing small RNAs.

Genes may adapt to the Y chromosome after residing there for millions of years (Wakimoto and Hearn, 1990; Hearn et al., 1991). While most genes that move to the Y chromosome quickly degenerate (Tobler et al., 2017; Carvalho et al., 2015), a subset of new Y-linked genes are retained, presumably due to important roles in male fertility or sex chromosome meiotic drive. New Y-linked genes may adapt to this unique genomic environment, evolving structures and regulatory mechanisms that enable optimal expression on the heterochromatic and non-recombining Y chromosome (Dupim et al., 2018). We identified many Y-linked duplicates in the ~15 Mb of Y chromosome that we surveyed in each species. Future improvements in genomic sequence data and assemblies may recover additional Y-linked duplicates among the unassembled satellite-rich sequences. Here, we describe two new Y-linked ampliconic genes specific to the D. simulans clade—Lhk and CK2ßtes-Y–that show evidence of strong positive evolution and concerted evolution, suggesting that high copy numbers and Y-Y gene conversion are often important for the adaptation of new Y-linked genes.

Many ampliconic genes are taxonomically restricted and are not maintained at high copy numbers over long periods of evolutionary time (Soh et al., 2014; Bachtrog et al., 2019; Brashear et al., 2018; Ellison and Bachtrog, 2019; Hughes et al., 2010; Mueller et al., 2008). Some ampliconic gene families are found on both the X and Y chromosomes (Ellison and Bachtrog, 2019; Malone et al., 2015; Cocquet et al., 2012; Kruger et al., 2019; Lahn and Page, 2000). While we do not know the function of most such co-amplified gene families, the murine example of Slx/Slxl1 and Sly appears to be engaged in an ongoing arms race between the sex chromosomes (Cocquet et al., 2012). We propose that Y-linked gene amplification in the D. simulans clade initially occurred due to an arms race and was preserved by gene conversion.

It is intriguing that the CK2ßtes-like/CK2ßtes-Y gene family is homologous to the Ste/Su(Ste) system in D. melanogaster (Kogan et al., 2012), which is also hypothesized to play a role in sex-chromosome meiotic drive (Hurst, 1992). We speculate that in both the D. melanogaster and D. simulans clade lineages these gene amplifications have been driven by conflict between the sex chromosomes over transmission through meiosis, but that the conflict involves different molecular mechanisms. In the CK2ßtes-like/CK2ßtes-Y system, both X and Y-linked genes are protein-coding genes, which is reminiscent of Slx/Slxl1 and Sly which compete for access to the nucleus where they regulate sex-linked gene expression (Cocquet et al., 2012; Kruger et al., 2019). In contrast, the Y-linked Su(Ste) copies in D. melanogaster produce small RNAs that suppress the X-linked Stellate (Aravin et al., 2004). We propose that CK2ßtes-like/CK2ßtes-Y system in the D. simulans clade species may represent the ancestral state because the parental gene Ssl is a protein-coding gene. We speculate that systems arising from antagonisms between the sex chromosomes may shift from protein-coding to RNA-based over time because, with RNAi, suppression is maintained at a minimal translation cost.

Distinct Y-linked mutation patterns are described in many species (Soh et al., 2014; Rozen et al., 2003; Hughes and Page, 2015; Bachtrog et al., 2019; Tobler et al., 2017; Peichel et al., 2019; Brashear et al., 2018; Hall et al., 2016). Our analyses provide a link between Y-linked mutation patterns and Y chromosome evolution. While the lack of recombination and male-limited transmission of the Y chromosome reduces the efficacy of selection, the high gene duplication and gene conversion rates may counter these effects and help acquire and maintain new Y-linked genes. The unique Y-linked mutation patterns might be the direct consequence of the heterochromatic environment on sex chromosomes. Therefore, we predict that W chromosomes and non-recombining sex-limited chromosomes (e.g. some B chromosomes), may share similar mutation patterns with Y chromosomes. Indeed, W chromosomes of birds have ampliconic genes and are rich in tandem repeats (Backström et al., 2005; Komissarov et al., 2018). However, there seem to be fewer ampliconic gene families on bird W chromosomes compared to Y chromosomes in other animals, suggesting that sexual selection and intragenomic conflict in spermatogenesis are important contributors to Y-linked gene family evolution (Bachtrog, 2020; Rogers, 2021).

Materials and methods

Assembling Y chromosomes using Pacbio reads in D. simulans clade

Request a detailed protocol

We applied the heterochromatin-sensitive assembly pipeline from Chang and Larracuente, 2019. We first extracted 229,464 reads with 2.2-Gbp in D. mauritiana, 269,483 reads with 2.3-Gbp in D. simulans, and 257,722 reads with 2.6-Gbp in D. sechellia using assemblies from Chakraborty et al., 2021, respectively. We then assembled these reads using Canu v1.3 and FALCON v0.5.0 combined the parameter tuning method on two error rates, eM and eg, in bogart to optimize the assemblies. We first made the Canu assemblies using the parameters ‘genomeSize = 30 m stopOnReadQuality = false corMinCoverage = 0 corOutCoverage = 100 ovlMerSize = 31’ and ‘genomeSize = 30 m stopOnReadQuality = false’. For FALCON v0.5.0, we used the parameters ‘length_cutoff = –1; seed_coverage = 30 or 40; genome_size = 30000000; length_cutoff_pr = 1000’. We then picked the assemblies with highest contiguity and completeness without detectable misassemblies from each setting (two Canu settings and one Falcon setting).

After picking the three best assemblies for each species, we tentatively reconciled the assemblies using Quickmerge (Chakraborty et al., 2016). We examined and manually curated the merged assemblies. For the D. mauritiana assembly, we merged two Canu and one FALCON assemblies, and for our D. simulans and D. sechellia assemblies, we merged one Canu and one FALCON assemblies independently. We manually curated some conserved Y-linked genes using raw reads and cDNA sequences from NCBI, including kl-3 of D. mauritiana, kl-3, kl-5, and PRY of D. simulans and CCY, PRY, and Ppr-Y of D. sechellia, due to their low coverage and importance for our phylogenetic analyses. We then merged our heterochromatin restricted assemblies with contigs of the major chromosome arms from Chakraborty et al., 2021. We polished the resulting assemblies once with Quiver using PacBio reads (SMRT Analysis v2.3.0; Chin et al., 2013) and ten times with Pilon v1.22 (Walker et al., 2014) using raw Illumina reads with parameters ‘--mindepth 3 --minmq 10 --fix bases’.

We identified misassemblies and found parts of Y-linked sequences in the contigs from major arms using our female/male coverage assays in D. sechellia. We also assembled the total reads (assuming genome size of 180 Mb) and heterochromatin-extracted reads (assuming genome size 40 Mb) using wtdbg v2.4 with parameters ‘-x rs -t24 -X 100 -e 2’ (Ruan and Li, 2020) and Flye v2.4.2 (Kolmogorov et al., 2019) with default parameters separately. We polished the resulting wtdbg assemblies with raw Pacbio reads using Flye v2.4.2. We then manually assembled five introns and fixed two misassemblies using sequences from wtdbg whole-genome assemblies (two introns), Flye whole-genome (two introns), and heterochromatin-enriched assemblies (one intron) in D. sechellia. We assembled one intron using sequences from wtdbg whole-genome assemblies in D. simulans.

We also extracted potential microbial reads (except for Wolbachia) that mapped to the D. sechellia microbial contigs, and assembled these reads into a 4.5 Mb contig, which represents the whole genome of a Providencia species, using Canu v 1.6 (r8426 14,520f819a1e5dd221cc16553cf5b5269227b0a3) with parameters ‘genomeSize = 5 m useGrid = false stopOnReadQuality = false corMinCoverage = 0 corOutCoverage = 100’. To detect other symbiont-derived sequences in our assemblies, we used Blast v2.7.1+ (Altschul et al., 1990) with blobtools (v1.0; Laetsch and Blaxter, 2017) to search the nt database (parameters ‘-task megablast -max_target_seqs 1 -max_hsps 1 -evalue 1e-25’). We estimated the Illumina coverage of each contig in males for D. mauritiana, D. simulans, and D. sechellia, respectively. We designated and removed contigs homologous to bacteria and fungi in subsequent analyses (Supplementary file 19).

Generating DNA-seq from males in the D. simulans clade

Request a detailed protocol

We extracted DNA from 30 virgin 0-day males using DNeasy Blood & Tissue Kit and diluted it in 100 µL ddH2O. The DNA was then treated with 1 µL 10 mg/mL RNaseA (Invitrogen) at 37 °C for 1 hr and was re-diluted in 100 µL ddH2O after ethanol precipitation. The size and concentration of DNA were analyzed by gel electrophoresis, Nanodrop, Qubit and Genomic DNA ScreenTape. Finally, we constructed libraries using PCR-free standard Illumina kit and sequenced 125 bp paired-end reads with a 550 bp insert size from the libraries using Hiseq 2500 in UR Genomics Research Center. We deposited the reads in NCBI’s SRA under BioProject accession number PRJNA748438.

Identifying Y-linked contigs

Request a detailed protocol

To assign contigs to the Y chromosome, we used Illumina reads from male and female PCR-free genomic libraries (except females of D. mauritiana) as described in Chang and Larracuente, 2019. In short, we mapped the male and female reads separately using BWA (v0.7.15; Li and Durbin, 2010) and called the coverage of uniquely mapped reads per site with samtools (v1.7; -Q 10 Li et al., 2009). We further assigned contigs with the median of male-to-female coverage across contigs equal to 0 as Y-linked. We examined the sensitivity and specificity of our methods using all 10 kb regions with known location. Based on our results for 10 kb regions with known location (Supplementary file 2) in D. mauritiana, we set up an additional criterion for this species—‘the average of female-to-male coverage < 0.1’—to reduce the false discovery rate.

Gene and repeat annotations

Request a detailed protocol

We used the same pipeline and data to annotate genomes as a previous study (Chakraborty et al., 2021). We collected transcripts and translated sequences from D. melanogaster (r6.14) and transcript sequences from D. simulans Nouhaud, 2018 using IsoSeq3 (Gordon et al., 2015). We mapped these sequences to each assembly to generate annotations using maker2 (v2.31.9; Holt and Yandell, 2011). We further mapped the transcriptomes using Star 2.7.3 a 2-pass mapping with the maker2 annotation and parameters ‘-outFilterMultimapNmax 200 --alignSJoverhangMin 8 --alignSJDBoverhangMin 1 --outFilterMismatchNmax 999 --outFilterMismatchNoverReadLmax 0.04 --alignIntronMin 20 --alignIntronMax 5000000 --alignMatesGapMax 5000000 --outSAMtype BAM SortedByCoordinate --readFilesCommand zcat --peOverlapNbasesMin 12 --peOverlapMMp 0.1’. We then generated the consensus annotations using Stringtie 2.0.3 from all transcriptomes (Pertea et al., 2015). We further improved the mitochondria annotation using MITOS2. We assigned predicted transcripts to their homologs in D. melanogaster using BLAST v2.7.1+ (-evalue 1e-10; Altschul et al., 1990).

We used RepeatMasker v4.0.5 (Smit et al., 2013) with our custom library to annotate the assemblies using parameter ‘-s.’ Our custom library is modified from Chakraborty et al., 2021, by adding the consensus sequence of Jockey-3 from D. melanogaster to replace its homologs (G2 in D. melanogaster and Jockey-3 in D. simulans; Chang et al., 2019). We extracted the sequences and copies of TEs and other repeats using scripts modified from Bailly-Bechet et al., 2014. To annotate tandem repeats in assemblies, we used TRFinder (v4.09; Benson, 1999) with parameters ‘2 7 7 80 10 100 2000 -ngs -h’. We also used kseek (Wei et al., 2018) to search for tandem repeats in the male Illumina reads.

Transcriptome analyses

Request a detailed protocol

We mapped the testes transcriptome to the reference genomes of D. melanogaster, D. simulans, and D. mauritiana (Supplementary file 20; no available transcriptome from D. sechellia). We used Stringtie 2.0.3 (Pertea et al., 2015) to estimate the expression level using the annotation. However, we applied a different strategy for estimating expression levels of the Y-linked gene families due to the difficulties in precisely annotating multi-copies genes. We constructed a transcript reference using current gene annotation but replaced all transcripts from Lhk-1, Lhk-2, and CK2ßtes-Y with their species-specific reconstructed ancestral copies. We then mapped the transcriptome reads to this reference using Bowtie2 v 2.3.5.1 (Langmead and Salzberg, 2012) with parameters ‘-very-sensitive -p 24 k 200 X 1000 --no-discordant --no-mixed’. We then estimated the expression level by salmon v 1.0.0 (Patro et al., 2017) with parameters ‘-l A -p 24.’ We also mapped small RNA reads from D. simulans testes to our custom repeat library and reconstructed ancestral Lhk-1, Lhk-2, and CK2ßtes-Y sequences using Bowtie v 1.2.3 (Langmead, 2010) with parameters ‘-v3 -q -a -m 50 --best –strata.’

To assay the specific expression of different copies, we also mapped transcriptomic and male genomic reads to the same reference using BWA (v0.7.15; Li and Durbin, 2010). We used ABRA v2.22 (Mose et al., 2019) to improve the alignments around the indels of these two gene families. We used samtools (v1.7; Li et al., 2009) to pile up reads that mapped to reconstructed ancestral copies and estimated the frequency of derived SNPs in the reads.

Estimating Y-linked exon copy numbers using Illumina reads

Request a detailed protocol

We mapped the Illumina reads from the male individuals of D. melanogaster and the D. simulans clade species to a genome reference with transcripts of 11 conserved Y-linked genes and the sequences of all non-Y chromosomes (r6.14) in D. melanogaster. We called the depth using samtools depth (v1.7; Li et al., 2009), and estimated the copy number of each exon using the mapped depth. We assumed most Y-linked exons are single-copy, so we divided the depth of each site by the majority of depth across all Y-linked transcripts to estimate the copy number. For the comparison, we simulated the 50 X Illumina reads from our assemblies using ART 2.5.8 with the parameter (art_illumina -ss HSXt -m 500 s 200 p -l 150 f 50; Huang et al., 2012). We then mapped the simulated reads to the same reference, called the depth, and divided the depth of each site by 50.

Immunostaining and FISH of mitotic chromosomes

Request a detailed protocol

We conducted FISH in brain cells following the protocol from Larracuente and Ferree, 2015 and immunostaining with FISH (immune-FISH) in brain cells following the protocol from Pimpinelli et al., 2011 and Chang et al., 2019. Briefly, we dissected brains from third instar larva in 1 X PBS and treated them for 1 min in hypotonic solution (0.5% sodium citrate). Then, we fixed brain cells in 1.8% paraformaldehyde, 45% acetic acid for 6 min. We subsequently dehydrated in ethanol for the FISH experiments but not for the immune-FISH.

For immunostaining, we rehydrated the slide using PBS with 0.1% TritonX-100 after removing the coverslip using liquid nitrogen. The slides were blocked with 3% BSA and 1% goat serum/ PBS with 0.1% TritonX-100 for 30 min and hybridized with 1:500 anti-Cenp-C antibody (gift from Dr. Barbara Mellone) overnight at 4 °C. We used 1:500 secondary antibodies (Life Technologies Alexa-488, 546, or 647 conjugated, 1:500) in blocking solution with 45 min room temperature incubation to detect the signals. We fixed the slides in 4% paraformaldehyde in 4XSSC for 6 min before doing FISH.

We added probes and denatured the fixed slides at 95 °C for 5 min and then hybridized slides at 30°C overnight. For PCR amplified probes with DIG or biotin labels, we blocked the slides for 1 hr using 3% BSA/PBS with 0.1% Tween and incubated slides with 1:200 secondary antibodies (Roche) in 3% BSA/4 X SSC with 0.1% Tween and BSA at room temperature for 1 hr. We made Lhk and CK2ßtes-Y probes using PCR Nick Translation kits (Roche) and ordered oligo probes from IDT. We list probe information in Supplementary file 3. We mounted slides in Diamond Antifade Mountant with DAPI (Invitrogen) and visualized them on a Leica DM5500 upright fluorescence microscope, imaged with a Hamamatsu Orca R2 CCD camera and analyzed using Leica’s LAX software. We interpreted the binding patterns of Y chromosomes using the density of DAPI staining solely.

Phylogenetic analyses of Y-linked genes

Request a detailed protocol

We used BLAST v2.7.1+ (Altschul et al., 1990) to extract the sequences of Y-linked duplications and conserved Y-linked genes from the genome. We only used high-quality sequences polished by Pilon (--mindepth 3 --minmq 10) for our phylogenetic analyses. We aligned and manually inspected sequences with reference transcripts from Flybase using Geneious v8.1.6 (Kearse et al., 2012). For most Y-linked duplications, except for the genes homologous to Lhk and CK2ßtes-Y, we constructed neighbor-joining trees using the HKY model with 1000 replicates using Geneious v8.1.6 (Kearse et al., 2012) to infer their phylogenies. We also measured the length and microhomology in 223 indels from 21 Y-linked duplications using these alignments (Supplementary file 15). We also infer the potential mechanisms causing the indels, including tandem duplications and polymerase slippage during DNA replication. We measured the length and microhomology of polymorphic indels in D. melanogaster (DGRP Huang et al., 2014) and D. simulans (Signor et al., 2018) populations from Chakraborty et al., 2021. For Lhk and CK2ßtes-Y, we constructed phylogeny using iqtree 1.6.12 (Nguyen et al., 2015; Hoang et al., 2018) using parameters “-m MFP -nt AUTO -alrt 1000 -bb 1000 -bnni”. The node labels in Figure 5 correspond to SH-aLRT support (%) / ultrafast bootstrap support (%). The nodes with SH-aLRT ≥ 80% and ultrafast bootstrap support ≥ 95% are strongly supported. Protein evolutionary rates (with CodonFreq = 0/1/2 in PAML) of the bold branches were estimated using PAML with branch models on the reconstructed ancestor sequences (Figure 6—figure supplement 1 and Figure 6—figure supplement 3).

Estimating recombination and selection on Y-linked ampliconic genes

Request a detailed protocol

Using the phylogenetic trees from iqtree, we infer the most probable sequences for the internal nodes using MEGA 10.1.5 (Kumar et al., 2018; Stecher et al., 2020) using the maximal likelihood method and G + I model with GTR model. We conducted branch and branch-site models tests in PAML 4.8 using the ancestral sequences of Y-linked and X-linked ampliconic gene families with their homologs on autosomes. We plotted the tree using R package ape 5.3 (Paradis et al., 2004).

We used compute 0.8.4 (Thornton, 2003) to calculate Rmin and population recombination rates based on linkage disequilibrium (Hudson, 1987; Hudson and Kaplan, 1985) and gene similarity. We included sites with indel polymorphisms in these analyses to increase the sample size (558–1544 bp alignments). We also reanalyzed data from Chang and Larracuente, 2019 to include variant information from these sites. The high similarity between Y-linked ampliconic gene copies may lead us to overestimate gene conversion based on gene similarity (Hudson, 1987). We therefore also reported the lower bound on the gene conversion rate using Rmin (Hudson and Kaplan, 1985).

GO term analysis

Request a detailed protocol

We used PANTHER (Released 20190711; Mi et al., 2019) with GO Ontology database (Released 2019-10-08) to perform Biological GO term analysis of new Y-linked duplicated genes using Fisher’s exact tests with FDR correction. We input 70 duplicated genes with any known GO terms and used all genes (13,767) in D. melanogaster as background.

Indel analyses

Request a detailed protocol

We downloaded the SNP calls (vcf files) from population genomic data in North Carolina of D. melanogaster (DGRP Huang et al., 2014) and California of D. simulans (Signor et al., 2018). We then used vcftools (Danecek et al., 2011) to remove the low-quality SNPs using parameters ‘--maf 0.1 --keep-only-indels --min-alleles 2 --max-alleles 2 --recode’. We additionally filtered out the potential mismapped regions with ‘--max-missing-count 20’ in D. melanogaster or ‘--max-missing-count 17’ in D. simulans. Lastly, we analyzed the SNPs in the specific regions using bedtools intersect (Quinlan and Hall, 2010) with gene annotation files (dmel-r5.57 or dsim annotation from maker2 v2.31.9; Holt and Yandell, 2011). For the heterochromatic pseudogenes, we download 18 long-read polished assemblies from NCBI (Supplementary file 20). We then used blastn to get sequences of pseudogenes from the population, aligned, and surveyed their indel lengths. All the alignments for our indel assignment are available in the GitHub repository (https://github.com/LarracuenteLab/simclade_Y; Chang, 2022; copy archived at swh:1:rev:b1939db576cb1616094a59775a38014a7d61eb7f) and the Dryad digital repository (https://doi.org/10.5061/dryad.280gb5mr6).

Appendix 1

Low Pacbio coverage in heterochromatic regions

We find that PacBio coverage is lower than expected on Y chromosomes and in heterochromatic regions generally (Figure 1—figure supplement 2). We found a similar bias in the D. melanogaster genome (Chang and Larracuente, 2019), where the PacBio data were independently generated by a different group (Kim et al., 2014). While a previous paper suggests that CsCl might contribute to this bias (Krsticevic et al., 2015), we used Qiagen’s Blood and Cell culture DNA Midi Kit for DNA extraction. Heterochromatin is underreplicated in the endoreplicated cells that undergo multiple rounds of S phase but with no cell division such as those in larval salivary glands cells (Smith and Orr-Weaver, 1991). Previous studies demonstrated that endoreplicated cells in the adult flies might contribute to lower coverage in Illumina sequencing data (Flynn et al., 2020). Therefore, these endoreplicated cells might also contribute to the bias in Pacbio coverage.

Validation of variants in Y-linked gene families

We mapped Illumina reads from male genomic DNA and testis RNAseq to the reconstructed ancestral transcript sequences of each gene cluster (Lhk-1, Lhk-2, CK2ßtes-Y) to estimate the expression level of the different Y-linked copies. We first asked if the variants in these two gene families found in our assemblies can be consistently detected in Illumina reads from male genomes. We found that the abundance of derived variants in these two gene families in the DNA-seq data are highly correlated to the frequency of variants in our assemblies (R = 0.89 and 0.98 in D. mauritiana and D. simulans, respectively). For 559 variants in the D. simulans assembly, 33 of them (28 appear once and four appear twice) are missing from the DNA-seq data. For 446 variants in the D. mauritiana assembly, 43 of them (32 appear once and six appear twice) are missing from the DNA-seq data. Additionally, nine and eight inconsistent variants are located near ( < 100 bp) the start or end of transcripts in D. simulans and D. mauritiana, respectively. These regions at the edges of transcripts might have fewer Illumina reads coverage than more central regions.

We compared the proportion of synonymous and nonsynonymous changes between copies with high and low expression using transcriptome data to infer selection pressures on different mutations (Figure 6—figure supplement 2; Supplementary file 21).

To reduce the effect of sequencing errors and simplify the phylogenetic analyses on protein evolution rates, we first reconstructed the ancestral sequences of each gene cluster (Lhk-1, Lhk-2, CK2ßtes-Y, and 2 CK2ßtes-like; see Figure 6). The reconstructed ancestral sequences should eliminate misassembled bases, which are typically singletons. We conducted branch-model and branch-site-model tests on the reconstructed ancestral sequence using PAML and inferred that both gene families experienced strong positive selection following their duplication to the Y chromosome (from branch model; Supplementary files 17 and 18, Figure 6). The high rate of protein evolution in the Y-linked ampliconic genes suggests that, in addition to subfunctionalization or degeneration, they may also acquire new functions and adapt to being Y-linked.

Data availability

Genomic DNA sequence reads are in NCBI's SRA under BioProject PRJNA748438. All scripts and pipelines are available in GitHub (https://github.com/LarracuenteLab/simclade_Y; copy archived at swh:1:rev:b1939db576cb1616094a59775a38014a7d61eb7f) and the Dryad digital repository (doi:https://doi.org/10.5061/dryad.280gb5mr6).

The following data sets were generated
    1. Chang C
    2. Gregory L
    3. Gordon K
    4. Meiklejohn C
    5. Larracuente A
    (2021) NCBI BioProject
    ID PRJNA748438. Genome sequencing of males in Drosophila simulans clade.
    1. Chang C
    2. Gregory L
    3. Gordon K
    4. Meiklejohn CD
    5. Larracuente A
    (2021) Dryad Digital Repository
    Unique structure and positive selection promote the rapid divergence of Drosophila Y chromosomes.
    https://doi.org/10.5061/dryad.280gb5mr6
The following previously published data sets were used
    1. Garrigan et al.
    (2012) NCBI BioProject
    ID PRJNA158675. Drosophila mauritiana Genome sequencing.
    1. Modencode S
    (2012) NCBI study
    ID SRP003905. D. melanogaster Dissected Tissue RNASeq.
    1. Gerstein et al.
    (2014) NCBI study
    ID SRP001696. modENCODE D. melanogaster Developmental Total RNA-Seq.
    1. Wei et al.
    (2018) NCBI BioProject
    ID PRJNA423291. D. melanogaster, D. simulans, D. sechellia, D. erecta, D. ananassae, D. pseudoobscura, D. persimilis, D mojavensis, and D. virilis Raw sequence reads.
    1. Laktionov et. al.
    (2018) NCBI BioProject
    ID PRJNA380909. Genome-wide profiling of gene expression and transcription factors binding reveals new insights into the mechanisms of gene regulation during Drosophila spermatogenesis [RNA-Seq].
    1. Lin et al.
    (2018) NCBI BioProject
    ID PRJNA477366. Drosophila simulans Raw sequence reads.
    1. Shah et al.
    (2020) NCBI BioProject
    ID PRJNA618654. Novel quality metrics identify high-quality assemblies of piRNA clusters.
    1. Kim BY
    (2021) NCBI BioProject
    ID PRJNA675888. Nanopore-based assembly of many drosophilid genomes.
    1. Chakraborty et al.
    (2021) NCBI BioProject
    ID PRJNA541548. Transcriptome sequencing of Drosophila simulans clade.

References

    1. Charlesworth B
    2. Charlesworth D
    (2000) The degeneration of Y chromosomes
    Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences 355:1563–1572.
    https://doi.org/10.1098/rstb.2000.0717

Decision letter

  1. Daniel R Matute
    Reviewing Editor; University of North Carolina, Chapel Hill, United States
  2. Patricia J Wittkopp
    Senior Editor; University of Michigan, United States

In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.

[Editors' note: this paper was reviewed by Review Commons.]

https://doi.org/10.7554/eLife.75795.sa1

Author response

Reviewer #1 (Evidence, reproducibility and clarity (Required)):

Summary:

I found this an exceptionally impressive manuscript. The evolution of Y chromosomes has until recently been nearly impossible, and this research group have pioneered approaches that can yield reliable results in Drosophila. The study used an innovative heterochromatin-sensitive assembly pipeline on three D. simulans clade species, D. simulans, D. mauritiana and D. sechellia, which diverged less than 250 KYA, allowing comparisons with the group's previous results for the D. melanogaster Y.

The study is both technically impressive and extremely interesting (an highly unusual combination). It includes a rich set of interesting results about these genome regions, and furthermore the results are discussed in a well-organised way, relating both to previous observations and to understanding of the genetics and evolution of Y chromosomes, illuminating all these aspects. It is a rare pleasure to read such a study. I believe that this study will inspire and be a model for future work on these chromosomes. It shows how these difficult genome regions can be studied.

Thank you for the positive evaluation of our paper. While we did not make any specific revisions in response to these comments, we did attempt to improve the writing.

Major comments:

The conclusions are convincing. The methods are explained unusually clearly, and the reasoning from the results is convincing. When appropriate, the caveats, the caveats are clearly explained. The material is clearly organised and the questions studied are well related to the results. I had a few minor comments concerning the English. Even the figure (often a major problem to understand) are very clear and helpful, with proper explanations. I have very rarely read such a good manuscript, and almost never (in a long career) found a manuscript that could be published without revision being necessary.

Thank you for pointing out that there were minor concerns with the English. We have carefully gone through the manuscript and fixed some minor issues with the writing.

The analysis found 58 exons missed in previous assemblies (as well as all previously known exons of the 11 canonical Y-linked genes, which are present in at least one copy across the group). FISH on mitotic chromosomes using probes for 12 Y-linked sequences was used to determine the centromere locations, and to determine gene orders and relate them to the cytological chromosome bands, demonstrating changes in satellite distribution, gene order, and centromere positions between their Y chromosomes within the D. simulans clade species. It also confirmed previous results for Y-linked ribosomal DNA,genes, which are responsible for XY pairing in D. melanogaster males. Although 28S rDNA has been lost in D. simulans and D. sechellia (but not in D. mauritiana), the intergenic spacer (IGS) repeats between these repeats are retained on both sex chromosomes in all three species. Only sequencing can reliably reveal this, as their abundance is below the detection level by FISH in D. sechellia. The 11 canonical Y-linked genes' copy numbers vary between the species, and some duplicates are expressed and have complete open reading frames, and may therefore be functional because they, but most include only a subset of exons, often with duplicated exons flanking the presumed functional gene copy. Mega-introns and Y-loops were found, as already seen in Drosophila species, but this new study detects turn overs in the ~2 million years separating D. melanogaster and the D. simulans clade. 49 independent duplications onto the Y chromosome were detected, including 8 not previously detected. At least half show no expression in testes, or lack open reading frames, so they are probably pseudogenes. Testis-expressed genes may be especially likely to duplicate into the Y chromosome due to its open chromatin structure and transcriptional activity during spermatogenesis, and indeed most of the new Y-linked genes in the species studied clade have likely functions in chromatin modification, cell division, and sexual reproduction.

The study discovered two new gene families that have undergone amplification on D. simulans clade Y chromosomes, reaching very high copy numbers (36-146). Both these families appear to encode functional protein-coding genes and show high expression. The paper described intriguing results that illuminate Y chromosome evolution.

First, SRPK, arose by an autosome-to-Y duplication of the sequence encoding the testis-specific isoform of the gene SR Protein Kinase (SRPK), after which the autosomal copy lost its testis-specific exon via a deletion. In D. melanogaster, SRPK is essential for both male and female reproduction, so the relocation of the testis-specific isoform to the Y chromosome in the D. simulans clade suggests that the change may have been advantageous by resolving sexual antagonism. The paper presents convincing evidence that the Y copy evolved under positive selection, and that gene amplification may confer advantageous increased expression in males. The second amplified gene family is also potentially related to an interesting function. Both Xlinked and Y-linked duplicates are found of a gene called Ssl located on chromosome 2R. In D. simulans, the X-linked copies were previously known, and called CK2ßtes-like. In D. melanogaster, degenerated Y-linked copies are also found, with little or no expression, contrasting with complete open reading frames and high expression in the D. simulans clade species in testes, consistent with the possibility of an arms race between sex chromosome meiotic drive factors.

Other interesting analyses document higher gene conversion rates compared to the other chromosomes, and evidence that these Y chromosomes may differ in the DNA-repair mechanisms (preferentially using MMEJ instead of NHEJ), perhaps contributing to their high rates of intrachromosomal duplication and structural rearrangements. The authors relate this to evidence for turnover of Y-linked satellite sequences, with the discovery of five new Y-linked satellites, whose locations were validated using FISH. The study also documented enrichment of LTR retrotransposons on the D. simulans clade Y chromosomes relative to the rest of the genome, together with turnovers between the species.

Reviewer #1 (Significance (Required)):

As described above, the advances are both, technical and conceptual for the field. The manuscript itself does an excellent job of placing the work in the context of the existing literature.

Anyone working on sex chromosomes and other non-recombining genome regions should be interested in the findings reported.

My field of expertise is the evolution of sex chromosomes, and the evolution of genome regions with suppressed recombination. I have experience of genomic analyses. I have less expertise in analyses of gene expression, but I understand enough about such approaches to evaluate the parts of this study that use them.

Reviewer #2 (Evidence, reproducibility and clarity (Required)):

The manuscript describes a thorough investigation of the Y-chromosomes of three very closely related Drosophila species (D. simulans, D. sechellia, and D. mauritiana) which in turn are closely related to D. melanogaster. The D. melanogaster Y was analysed in a previous paper by the same goup. The authors found an astonishing level of structural rearrangements (gene order, copy number, etc.), specially taking into account the short divergence time among the three species (~250 thousand years). They also suggest an explanation for this fast evolution: Y chromosome is haploid, and hence double-strand breaks cannot be repaired by homologous recombination. Instead, it must use the less precise mechanisms of NHEJ and MMEJ. They also provide circumstantial evidence that MMEJ (which is very prone to generate large rearrangements) is the preferred mechanism of repair. As far as I know this hypothesis is new, and fits nicely on the fast structural evolution described by the authors. Finally, the authors describe two intriguing Y-linked gene families in D. simulans (Lhk and CK2ßtes-Y), one of them similar to the Stellate / Suppressor of Stellate system of D. melanogaster, which seems to be evolving as part of a X-Y meiotic drive arms race. Overall, it is a very nice piece of work. I have four criticisms that, in my opinion, should be addressed before acceptance.

Thank you for your positive comments. We respond to your concerns point-by-point below.

The suggestion/conclusion that MMEJ is the preferential repair mechanism (over NHEJ) should be better supported and explained. At line 387, the authors stated "The pattern of excess large deletions is shared in the three D. simulans clade species Y chromosomes, but is not obvious in D. melanogaster (Figure 6B). However, because all D. melanogaster Y-linked indels in our analyses are from copies of a single pseudogene (CR43975), it is difficult to compare to the larger samples in the simulans clade species (duplicates from 16 genes). ". Given that D. melanogaster has many Y-linked pseudogenes (described by the authors and by other researchers, and listed in Table S6), there seems to be no reason to use a sample size of 1 in this species.

We only used pseudogenes with large alignable regions (>300 bp) to prevent the potential bias toward small indels and increase our confidence in indel calling. As a result, we excluded most of the duplicates on the D. melanogaster Y chromosome. We now include 5 additional D. melanogaster Y-linked indels in the manuscript, however, the majority of indels in this species (36/41) are still from the same gene.

Furthermore, given that D. melanogaster is THE model organism, it is the species that most likely will provide information to assess the "preferential MMEJ" hypothesis proposed by the authors.

A previous paper has shown that male flies deficient in MMEJ have a strong bias toward female offspring (McKee et al. 2000), suggesting that MMEJ is necessary for successfully producing Y-bearing sperm, consistent with our hypothesis. We agree with the reviewer that careful genetic and cytological experiments in D. melanogaster could further clarify the role of MMEJ in the repair of Y-linked mutations. Even more revealing would be experiments using the simulans clade species, where we hypothesize the MMEJ bias is even more pronounced on the Y chromosome. We believe, however, that these experiments are beyond the scope of this study and should merit their own papers.

Still on the suggestion/conclusion that MMEJ is the preferential repair mechanism (over NHEJ). Y chromosome in heterochromatic, haploid and non-recombining. In order to ascribe its mutational pattern to the haploid state (and the consequent impossibility of homologous recombination repair), the authors compared it to chromosome IV (the so called "dot chromosome"). This may not be the best choice: while chr IV lacks recombination in wild type flies, it is not typical heterochromatin. E.g., " results from genetic analyses, genomic studies, and biochemical investigations have revealed the dot chromosome to be unique, having a mixture of characteristics of euchromatin and of constitutive heterochromatin". Riddle and Elgin, FlyBook 2018 (https://doi.org/10.1534/genetics.118.301146). Given this, it seems appropriate to also compare the Y-linked pseudogenes with those from typical heterochromatin. In Drosophila, these are the regions around the centromeres ("centric heterochromatin"). There are pseudogenes there; e.g., the gene rolled is known to have partially duplicated exons.

Thank you for the suggestion. We now include the data from pericentric heterochromatin and pseudogenes in supplemental data (see Figure 7). Both data types support our conclusion that indel size is only larger on Y chromosomes, which is consistent with the comparison between the dot chromosome and pericentric heterochromatin reported by Blumenstiel et al. 2002.

In some passages of the manuscript there seems to be a confusion between new genes and pseudogenes, which should be corrected. For example, in line 261: "Most new Y-linked genes in D. melanogaster and the D. simulans clade have presumed functions in chromatin modification, cell division, and sexual reproduction (Table S7)".. Who are these "new genes"? If they are those listed in Table S6 (as other passages of the text suggest), most if not all of them are pseudogenes. If they are pseudogenes, it is not appropriate to refer to them as "new genes". The same ambiguity is present in line 263: "Y-linked duplicates of genes with these functions may be selectively beneficial, but a duplication bias could also contribute to this enrichment (…) " Pseudogenes can be selectively beneficial, but in very special cases (e.g.. gene regulation). If the authors are suggesting this, they must openly state this, and explain why. Pseudogenes are common in nearly all genomes, and should be clearly separated from genes (the later as a shortcut for functional genes). The bar for "genes" is much higher than simple sequence similarity, including expression, evidences of purifying selecion, etc., as the authors themselves applied for the two gene families they identified in D. simulans (Lhk and CK2ßtes-Y).

Thank you for the suggestion. We now state our criteria for calling genes based on the expression and long CDS and correct the sentences that the reviewer refers to. The protein evolution rates of many Y-linked duplicates were surveyed in Tobler et al. 2017, who found that most are not under strong purifying selection. Our study supports this previous report. We think that protein evolution rate alone may not be a good indicator for functionality. Our current study does not focus on the potential function of these genes, and we think further population studies are required to get a solid conclusion. We changed the text to clarify this point: “Most new Y-linked duplications in D. melanogaster and the D. simulans clade are from genes with presumed functions in chromatin modification, cell division, and sexual reproduction (Supplementary file 8, formerly Table S7), consistent with other Drosophila species [17, 77].” (p15 L281-284).

The authors center their analysis on "11 canonical Y-linked genes conserved across the melanogaster group ". Why did they exclude the CG41561 gene, identified by Mahajan and Bachtrog (2017) in D. melanogaster? Given that most D. melanogaster Y-linked genes were acquired before the split from the D. simulans clade (Koerich et al. Nature 2008), the same most likely is true for CG41561 (i.e., it would be Y-linked in the D. simulans clade). Indeed, computational analysis gave a strong signal of Y-linkage in D. yakuba (unpublished; I have not looked in the other species). If CG41561 is Y-linked in the simulans clade, it should be included in the present paper, for the only difference between it and the remaining "canonical genes" was that it was found later. Finally, the proper citation of the "11 canonical Y-linked genes" is Gepner and Hays PNAS 1993 and Carvalho, Koerich and Clark TIG 2009 (or the primary papers), instead of ref #55.

Thank you for the suggestion. CG41561 is indeed a relatively young Y-linked gene because it’s not Y-linked in D. ananassae (Muller’s element E). We already have CG41561 in Supplementary file 7 (formerly Table S6) and we think that it is reasonable to separate a young Y-linked gene from the others. We also fixed the reference as suggested (p5 L116).

Other points/comments/suggestions:

a. Possible reference mistake: line 88 "For example, 20-40% of D. melanogaster Y-linked regulatory variation (YRV) comes from differences in ribosomal DNA (rDNA) copy numbers [52, 53]." reference #53 is a mouse study, not Drosophila.

Thank you for pointing out this error, we fixed the reference (p4 L91).

b. Possible reference mistake: line 208 "and the genes/introns that produce Y-loops differs among species [75]". ref #75 is a paper on the D. pseudoobscura Y. Is it what the authors intended?

Yes, our previous paper (ref 75) found that Y-loops do not originate from the kl-3, kl-5, and ORY genes in D. pseudoobscura because they don’t have large introns in this species.

c. line 113. "We recovered all known exons of the 11 canonical Y-linked genes conserved across the melanogaster group, including 58 exons missed in previous assemblies (Table S1; [55])." Please show in the Table S1 which exons were missing in the previous assemblies. I guess that most if not all of these missing exons are duplicate exons (and many are likely to be pseudogenes). If they indeed are duplicate exons, the authors should made it clear in the main text, e.g., "We recovered all known exons of the 11 canonical Y-linked genes conserved across the melanogaster group, plus 58 duplicated exons missed in previous assemblies."

Thank you for the suggestion. However, the 58 exons did not include the duplicated exons. We are similarly surprised how much we will miss if we don’t assemble the Y chromosome carefully. We now mark these exons in red in Supplementary file 1 (formerly Table S1) to make this point clearer.

d. line 116 "Based on the median male-to-female coverage [22], we assigned 13.7 to 18.9 Mb of Y-linked sequences per species with N50 ranging from 0.6 to 1.2 Mb." The method (or a very similar one) was developed by Hall et al. BMC Genomics 2013, which should be cited in this context. (e) line 118: "We evaluated our methods by comparing our assignments for every 10-kb window of assembled sequences to its known chromosomal location. Our assignments have 96, 98, and 99% sensitivity and 5, 0, and 3% false-positive rates in D. mauritiana, D. simulans, and D. sechellia, respectively (Table S2). The procedure is unclear. Why break the contigs in 10kb intervals, instead of treating each as an unity, assignable to Y, X or A? The later is the usual procedure in computational identification of suspect Y-linked contigs (Carvalho and lark Gen Res 2013; Hall et al. BMC Genomics 2013). The only reason I can think for analyzing the contigs piecewise is a suspicion of misassemblies. If this is the case, I think it is better to explain.

Thank you for the suggestion. We did not break the contigs into 10kb intervals when we assigned the Y-linked contigs. As you suspect, our motivation for evaluating our methods and analyzing the contigs in 10kb intervals was to detect possible misassemblies. We rewrote the sentence to make this point clearer (p6 L129-132).

e. Figure 1. It may be interesting to put a version of Figure 1 in the SI containing only the genes and the lines connecting them among species, so we can better see the inversions etc. (like the cover of Genetics , based on the paper by Schaeffer et al. 2008).

Thank you for the suggestion. We would like to make a figure like that fantastic cover image you refer to, but the repetitive nature of the Y chromosome makes it difficult to illustrate rearrangements based on alignments at the contig-level. We instead opted to update Figure 1 to better highlight the rearrangements, still based on the unique protein-coding genes which are supported by the FISH experiments.

f. Table S6 (Y-linked pseudogenes). Several pseudogenes listed as new have been studied in detail before: vig2, Mocs2, Clbn, Bili (Carvalho et al. PNAS2015) Pka-R1, CG3618, Mst77F (Russel and Kaiser Genetics 1993; Krsticevic et al. G3 2015). Note also that at least two are functional (the vig2 duplication and some Mst77 duplications).

Thank you for the suggestion. We now include a column to indicate the potential function of Y-linked duplicates (see Supplementary file 7, formerly Table S6).

g. line 421: "one new satellite, (AAACAT)n, originated from a DM412B transposable element, which has three tandem copies of AAACAT in its long terminal repeats." The birth of satellites from TEs has been observed before, and should be cited here. Dias et al. GBE 6: 1302-1313, 2014.

Thank you for the suggestion. We now include a sentence to cite this reference (p27 L467-468).

h. Figure S2 shows that the coverage of PacBio reads is smaller than expected on the Y chromosome. Any explanation? This has been noticed before in D. melanogaster, and tentatively attributed to the CsCl gradient used in the DNA purification (Carvalho et al. GenRes 2016). However, it seems that the CsCl DNA purification method was not used in the simulans clade species (is it correct?). Please explain the in the manuscript, or in the SI. The issue is relevant because PacBio sequencing is widely believed to be unbiased in relation to DNA sequence composition (e.g., Ross et al. Genome Biol 2013).

Yes, we used Qiagen's Blood and Cell Culture DNA Midi Kit for DNA extraction. We suspect that the underrepresentation of Y-linked reads is driven by the presence of endoreplicated tissue in adults. Heterochromatin is underreplicated in endoreplicated cells, and thus there may simply be less heterochromatin in these tissues. Consistent with this idea, we find that all heterochromatin seems to be underrepresented in the reads, not just the Y chromosome (see Chakraborty et al. 2021; Flynn et al. 2020). We now include this discussion in the SI of our paper (see supplementary text p75).

i. I may have missed it, but in which public repository have the assemblies been deposited?

We link to the assemblies in Github (https://github.com/LarracuenteLab/simclade_Y) and they will also be in the Dryad Digital Repository (doi forthcoming).

Reviewer #3 (Evidence, reproducibility and clarity (Required)):

Due to suppressed recombination, Y chromosomes have degenerated, undergone extensive structural rearrangements, and accumulated ampliconic gene families across species. The molecular processes and selective pressures guiding dynamic Y chromosome evolution are not well understood. In this study, Chang et al. generate updated Y assemblies of three closely related species in the D. simulans complex using long-read PacBio sequencing in combination with FISH. Despite having diverged only 250,00 years ago, the authors find structural rearrangements, two newly amplified gene families and evidence of positive selection across D. simulans. The authors also suggest the high level of Y duplications and deletions may be mediated by MMEJ biased repair.

Our aim is to discover and understand the many different factors and processes that shape the evolution of Y chromosome organization and function. Because these Y chromosomes were largely unassembled, we needed to first generate the sequence assembly before we could ask specific questions. We prefer not to focus the manuscript solely on one specific topic such as MMEJ repair, as our other observations and analyses may be interesting to a wide range of scientists studying topics other than mutation and DNA repair. We are therefore choosing to present the more comprehensive story about Y chromosome evolution that we included in our original manuscript.

We also respectfully disagree with the comment that our paper is just a descriptive survey of Y chromosomal sequence features. On the contrary, we present thorough evolutionary analyses to test hypotheses about the forces shaping the evolution of Y chromosome organization and Y-linked genes. Specifically, we use molecular evolution and phylogenetic and comparative genomics approaches to show that multi-copy gene families experience rampant gene conversion and positive selection. We posit that one simulans clade-specific Y-linked gene family has undergone subfunctionalization, potentially resolving sexual conflict, and another may be involved in meiotic drive. We also use evolutionary genomic approaches to show that the distribution of Y-linked mutations indeed suggests that Y chromosomes disproportionately use MMEJ and we propose that this unique feature may shape the evolution of Y chromosome structural organization. This is, as far as we know, a novel hypothesis. We think that follow-up studies of either hypothesis merit different papers.

Major concerns:

1. Title: The authors use "unique structure" in the title, which is a vague point. Are not Y chromosomes, or any chromosome, "unique" in some manner? Also are there not more evolutionary processes governing the rapid divergence of the Y's.

Thank you for raising your concern. We believe that we are justified in referring to the Y chromosome as unique among all other chromosomes in its structural properties (e.g. combination of its hemizygosity, abundant tandem repeats, large scale rearrangements, and highly amplified testis-specific genes). Because there are many properties of Y chromosomes that we believe contribute to their rapid divergence, we opted for the general phrase ‘unique structure’ to capture all of these features. Many evolutionary processes likely shape the evolution of that unique structure (e.g. Muller’s Ratchet, background selection, Hill Robertson effects; see Charlesworth and Charlesworth 2000 for a review), and these processes are well-studied, especially on newly evolved sex chromosomes. Here our focus is on evolutionarily old Y chromosomes, which may have comparatively fewer targets of purifying selection and are more likely to be shaped by positive selection (Bachtrog 2008).

2. p.2, line 53-56: The authors claim that sexually antagonistic selection and regulatory evolution are causes of recombination suppression. Couldn't this statement be reversed? Recombination suppression via inversions or other rearrangements enable sexually antagonistic selection. This is a chicken or egg question, so it should be revised to have both possibilities be equal.

Thank you for the suggestion. We think that it is unlikely that recombination suppression itself is beneficial, but for sexually antagonistic selection and regulatory evolution, recombination suppression can have short-term benefits. We rephrased this sentence to be agnostic about the direction (p2 L56).

3. p.5, 118-120: Are the assemblies de novo or have they been guided based upon the D.

melanogaster Y chromosome assembly? Please clarify how the authors evaluate their methods by comparing their Y-sequence assignments to known chromosomal locations.

Thank you for the suggestion. We didn’t use D. melanogaster Y chromosome assembly to guide our assemblies. “All assemblies are generated de novo”, and thus we don’t think there is any potential bias. We first assigned Y-linked sequences using the presence of known Y-linked genes, and used this assignment to evaluate our methods. We now make the sentence clear (p5 L112).

4. While the gene copy number estimates are accurate, the PacBio-based genome assemblies are still not able to accurately assemble large segmental duplications (see Evan Eichler's laboratories recent primate and human genome assemblies). A statement mentioning the concerns about accuracy of the underlying sequence and genomic architecture shown should be included in the main text. FISH provides support for the location of the contigs, but not for the accuracy of the underlying genomic architecture.

Thank you for the suggestion. We can’t validate all Y-linked regions. We did validate the larger structural features of the assembly and only discuss the results that we are confident in. We now include sentences to address this concern (p7 L150-152).

5. The authors assigned Y-linked sequences based on median male-to-female coverage. Is this method feasible for assigning ampliconic sequence to the Y given the N50 of 0.6-1.2Mb? Are the authors potentially excluding novel Y-linked ampliconic sequence?

We validated our methods to assign contigs to a chromosome by comparing 10-kb intervals to the contigs with known chromosomal location, including the Y chromosome. Our assignments have high (96, 98, and 99%) sensitivity and low (5, 0, and 3%) falsepositive rates in D. mauritiana, D. simulans, and D. sechellia, respectively (see Supplementary file 2, formerly Table S2). Based on these results, we think that this method is reasonable for Y-linked contigs with N50 of 0.6-1.2Mb.

We might exclude some novel Y-linked sequences since we only assigned ~15Mb out of a total ~40 Mb Y-linked sequences. We acknowledged this possibility, and now include a sentence to address this concern (p31 L554-556).

6. Where did the rDNA sequences go in D. simulans and D. sechellia? Can they be detected on another chromosome?

Please see Figure S5 for detailed results. We found a few copies of rDNA on the contigs of autosomes. We assembled many copies of rDNA that can’t be confidently assigned to Y chromosomes. It’s possible that they might be located on other chromosomes. Based on our FISH data (Figure S4) and previous papers, most of these non-Y-linked rDNA copies should be on the X chromosome. However, in this study, we did not make a concerted effort to assign X-linked contigs.

7. Figure 2B is hard to follow and it is unclear what additional value it provides to part A. Why is expression level of specific exons important?

Exon duplication may be an important contributor to Y-linked gene evolution: most genes have duplications and our figure shows that at least some of these duplicates are expressed. The patterns we see indicate that duplication may play different roles in genes depending on their length. For example, the duplications involving short genes (e.g., ARY) may be functional and influence protein expression, whereas duplications involving large genes (e.g. kl-2) may not influence the overall protein expression level from this gene, although the expressed duplicated exons may play some other role. We revised a sentence in the main text and added a sentence to the figure 2 legend to make this point clearer.

8. Figure 3 There are many introns that contain gaps, so it is unclear how confident one can be in intron length when there are gaps.

Indeed, we are not confident about the length of introns with gaps. Therefore, we separated these introns and showed them in different colors.

9. Figure 4: What are the authors using as a common ancestor in this figure to infer duplications in the initial branch?

We used phylogenies to infer the origin of Y-linked duplicates. Any duplications that happened earlier than the divergence between four species are listed in the branch. We also edited the legend to make this point clearer.

10. p.15, paragraph 2: The authors describe a newly amplified gene, CK2Btes-Y, in D. simulans. In the first half of the paragraph the authors state that Y-linked copies are also found in D. melanogaster but have "degenerated and have little or no expression" and call them pseudogenes. Later in the paragraph, the authors state that the D. melanogaster Y-linked copies are Su(Ste), a source of piRNAs that are in conflict with X-linked Stellate. Lastly in the paragraph, the authors discuss Su(ste) as a D. melanogaster homolog of CK2Btes-Y. The logic of defining CK2Btes-Y origins is confusing. Was CK2Btes-Y independently amplified on the D. simulans Y, or were CK2BtesY and Su(Ste) amplified in a common ancestor but independently diverged?

The amplification of CK2Btes-Y and CK2Btes-like happened in the ancestor of D.

melanogaster and D. simulans (Figure S11). However, both CK2Btes-Y and CK2Btes-like became pseudogenes (D. melanogaster CK2Btes-Y is named PCKR in a previous study) in D. melanogaster. On the other hand, Ste and Su(Ste) are only limited to D.

melanogaster based on phylogenetic analyses (Figure 5A) and are a chimera of CK2Bteslike and NACBtes. The evolutionary history of this gene family has been detailed in other papers, except for the presence of CK2Btes-Y in the D. simulans complex, which we describe for the first time in this study. We now include a new figure (Figure 5B) a schematic of the inferred evolutionary history of sex-linked Ssl/CK2ßtes paralogs

11. Figure 5: Is each FISH signal a different gene copy?

Yes, based on our assemblies, Lhk-1 and Lhk-2 are mostly located on different contigs. Unfortunately, we are not able to design probes that can separate Lhk-1 from Lhk-2.

12. The authors suggest DNA-repair on the Y chromosome is biased towards MMEJ based on indel size and microhomologies. Is there any evidence MMEJ is responsible for variable intron length in the canonical Y-linked genes or the amplification of new gene families? Since MMEJ is error-prone, it's a more tolerable repair mechanism in pseudogenes, so their findings might be biased. Rather than comparing pseudogenes to their parent genes, they should compare chrY pseudogenes to autosomal pseudogenes. Even more would be to track MMEJ on the dot chromosome which is known not recombine and is highly heterchromatic like the Y chromosome.

We did compare chrY pseudogenes to autosomal pseudogenes in our study. We also add new analyses to address other issues from reviewer 2, which are similar to your concern. We now include data from pericentric heterochromatin and pseudogenes (see Figure 7). Both data types support our conclusion that indel size is only larger on Y chromosomes. This is consistent with a report that the dot chromosome and pericentric heterochromatin have similar indel size distributions (Blumenstiel et al. 2002).

Reviewer #3 (Significance (Required)):

While it is a benefit to have much improved Y chromosome assemblies from the three D. simulans clade species, the gap in knowledge this manuscript is trying to address is unclear.

The manuscript is almost entirely descriptive and the figures are difficult to follow.

As stated above, we respectfully disagree with the comment that the manuscript is entirely descriptive, as we present thorough evolutionary analyses to test hypotheses about the forces shaping the evolution of Y chromosome organization and Y-linked genes. We have two guiding hypotheses about the importance of sexual antagonism and DNA repair pathways for Y chromosome evolution, and we conduct sequence analyses that support these hypotheses that sexual antagonism and MMEJ affect Y chromosome evolution.

References:

Bachtrog D. The temporal dynamics of processes underlying Y chromosome degeneration. Genetics. 2008 Jul;179(3):1513-25. doi: 10.1534/genetics.107.084012. Epub 2008 Jun 18. PMID: 18562655; PMCID: PMC2475751.

Blumenstiel, J.P., Hartl, D.L, Lozovsky, E.R.. Patterns of Insertion and Deletion in

Contrasting Chromatin Domains, Molecular Biology and Evolution, Volume 19, Issue 12,

December 2002, Pages 2211–2225, https://doi.org/10.1093/oxfordjournals.molbev.a004045

Chakraborty M, Chang CH, Khost DE, Vedanayagam J, Adrion JR, Liao Y, Montooth KL, Meiklejohn CD, Larracuente AM, Emerson JJ. Evolution of genome structure in the Drosophila simulans species complex. Genome Res. 2021 Mar;31(3):380-396. doi:

10.1101/gr.263442.120. Epub 2021 Feb 9. PMID: 33563718; PMCID: PMC7919458.

Charlesworth B, Charlesworth D. The degeneration of Y chromosomes. Philos Trans R Soc Lond B Biol Sci. 2000 Nov 29;355(1403):1563-72. doi: 10.1098/rstb.2000.0717. PMID: 11127901; PMCID: PMC1692900.

Flynn,J, Long, M, Wing, RA, A.G Clark, Evolutionary Dynamics of Abundant 7-bp

Satellites in the Genome of Drosophila virilis, Molecular Biology and Evolution, Volume 37, Issue 5, May 2020, Pages 1362–1375, https://doi.org/10.1093/molbev/msaa010

McKee, Bruce D. et al. “On the Roles of Heterochromatin and Euchromatin in Meiosis in Drosophila: Mapping Chromosomal Pairing Sites and Testing Candidate Mutations for Effects on X–Y Nondisjunction and Meiotic Drive in Male Meiosis.” Genetica 109 (2004): 77-93.

Tobler R, Nolte V, Schlötterer C. High rate of translocation-based gene birth on the Drosophila Y chromosome. Proc Natl Acad Sci U S A. 2017 Oct 31;114(44):11721-11726. doi: 10.1073/pnas.1706502114. Epub 2017 Oct 19. PMI

https://doi.org/10.7554/eLife.75795.sa2

Article and author information

Author details

  1. Ching-Ho Chang

    Department of Biology, University of Rochester, Rochester, United States
    Present address
    Division of Basic Sciences, Fred Hutchinson Cancer Research Center, Seattle, United States
    Contribution
    Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Supervision, Validation, Visualization, Writing – original draft, Writing – review and editing
    For correspondence
    cchang2@fredhutch.org
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-9361-1190
  2. Lauren E Gregory

    Department of Biology, University of Rochester, Rochester, United States
    Contribution
    Investigation, Validation, Writing – review and editing
    Competing interests
    No competing interests declared
  3. Kathleen E Gordon

    School of Biological Sciences, University of Nebraska-Lincoln, Lincoln, United States
    Present address
    Department of Molecular Biology and Genetics, Field of Genetics, Genomics and Development, Cornell University, Ithaca, United States
    Contribution
    Investigation, Writing – review and editing
    Competing interests
    No competing interests declared
  4. Colin D Meiklejohn

    School of Biological Sciences, University of Nebraska-Lincoln, Lincoln, United States
    Contribution
    Data curation, Formal analysis, Funding acquisition, Investigation, Validation, Visualization, Writing – review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-2708-8316
  5. Amanda M Larracuente

    Department of Biology, University of Rochester, Rochester, United States
    Contribution
    Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Resources, Supervision, Validation, Visualization, Writing – original draft, Writing – review and editing
    For correspondence
    alarracu@UR.Rochester.edu
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-5944-5686

Funding

National Institute of General Medical Sciences (R35GM119515)

  • Amanda M Larracuente

National Institute of General Medical Sciences (R01GM123194)

  • Colin D Meiklejohn

National Science Foundation (MCB 1844693)

  • Amanda M Larracuente

Damon Runyon Cancer Research Foundation (DRG: 2438-21)

  • Ching-Ho Chang

College of Arts and Sciences, University of Nebraska-Lincoln

  • Colin D Meiklejohn

University of Rochester

  • Amanda M Larracuente
  • Ching-Ho Chang

Ministry of Education, Taiwan

  • Ching-Ho Chang

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

This work was funded by the National Institutes of Health (NIH) (R35GM119515 to AML and R01GM123194 to CDM), National Science Foundation (NSF MCB 1844693) to AML and funding from the University of Nebraska-Lincoln to CDM. AML was supported by a Stephen Biggar and Elisabeth Asaro fellowship in Data Science. C-HC was supported by the Messersmith Fellowship from the U of Rochester, the Government Scholarship to Study Abroad from Taiwan, and the Damon Runyon fellowship (DRG: 2438–21). We thank our collaborators, Drs. JJ Emerson and Mahul Chakraborty, for generating PacBio reads in the D. simulans clade, Dr. Barbara Mellone for the antibodies, and Drs. Casey Bergman, Grace YC Lee, Kevin Wei and Anthony Geneva and Larracuente lab members for helpful discussion. We also thank the U of Rochester CIRC for access to computing cluster resources and UR Genomics Research Center for the library construction and sequencing.

Senior Editor

  1. Patricia J Wittkopp, University of Michigan, United States

Reviewing Editor

  1. Daniel R Matute, University of North Carolina, Chapel Hill, United States

Publication history

  1. Received: November 24, 2021
  2. Accepted: December 18, 2021
  3. Accepted Manuscript published: January 6, 2022 (version 1)
  4. Version of Record published: January 27, 2022 (version 2)

Copyright

© 2022, Chang et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 1,163
    Page views
  • 212
    Downloads
  • 3
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Ching-Ho Chang
  2. Lauren E Gregory
  3. Kathleen E Gordon
  4. Colin D Meiklejohn
  5. Amanda M Larracuente
(2022)
Unique structure and positive selection promote the rapid divergence of Drosophila Y chromosomes
eLife 11:e75795.
https://doi.org/10.7554/eLife.75795

Further reading

    1. Evolutionary Biology
    2. Microbiology and Infectious Disease
    Pramod K Jangir et al.
    Research Article

    Bacterial pathogens show high levels of chromosomal genetic diversity, but the influence of this diversity on the evolution of antibiotic resistance by plasmid acquisition remains unclear. Here, we address this problem in the context of colistin, a ‘last line of defence’ antibiotic. Using experimental evolution, we show that a plasmid carrying the MCR-1 colistin resistance gene dramatically increases the ability of Escherichia coli to evolve high-level colistin resistance by acquiring mutations in lpxC, an essential chromosomal gene involved in lipopolysaccharide biosynthesis. Crucially, lpxC mutations increase colistin resistance in the presence of the MCR-1 gene, but decrease the resistance of wild-type cells, revealing positive sign epistasis for antibiotic resistance between the chromosomal mutations and a mobile resistance gene. Analysis of public genomic datasets shows that lpxC polymorphisms are common in pathogenic E. coli, including those carrying MCR-1, highlighting the clinical relevance of this interaction. Importantly, lpxC diversity is high in pathogenic E. coli from regions with no history of MCR-1 acquisition, suggesting that pre-existing lpxC polymorphisms potentiated the evolution of high-level colistin resistance by MCR-1 acquisition. More broadly, these findings highlight the importance of standing genetic variation and plasmid/chromosomal interactions in the evolutionary dynamics of antibiotic resistance.

    1. Evolutionary Biology
    Milo S Johnson, Michael M Desai
    Research Article Updated

    As an adapting population traverses the fitness landscape, its local neighborhood (i.e., the collection of fitness effects of single-step mutations) can change shape because of interactions with mutations acquired during evolution. These changes to the distribution of fitness effects can affect both the rate of adaptation and the accumulation of deleterious mutations. However, while numerous models of fitness landscapes have been proposed in the literature, empirical data on how this distribution changes during evolution remains limited. In this study, we directly measure how the fitness landscape neighborhood changes during laboratory adaptation. Using a barcode-based mutagenesis system, we measure the fitness effects of 91 specific gene disruption mutations in genetic backgrounds spanning 8000–10,000 generations of evolution in two constant environments. We find that the mean of the distribution of fitness effects decreases in one environment, indicating a reduction in mutational robustness, but does not change in the other. We show that these distribution-level patterns result from differences in the relative frequency of certain patterns of epistasis at the level of individual mutations, including fitness-correlated and idiosyncratic epistasis.