Expansion and loss of sperm nuclear basic protein genes in Drosophila correspond with genetic conflicts between sex chromosomes

  1. Ching-Ho Chang  Is a corresponding author
  2. Isabel Mejia Natividad
  3. Harmit S Malik
  1. Division of Basic Sciences, Fred Hutchinson Cancer Center, United States
  2. Howard Hughes Medical Institute, Fred Hutchinson Cancer Center, United States
7 figures, 2 tables and 15 additional files


Figure 1 with 4 supplements
Origins and evolution of Drosophila sperm nuclear basic protein (SNBP) genes.

(A) Phylogenomic analysis of 13–15 SNBP genes from D. melanogaster organized into three groups (dotted lines): required for male fertility, not required for male fertility, or untested in previous analyses. We identified homologs of these genes in 14 other Drosophila species and an outgroup species, S. lebanonensis, whose phylogenetic relationships and divergence times are indicated on the left (Kumar et al., 2017). Genes retained in autosomal syntenic locations are indicated by black squares, whereas paralogs located in non-syntenic autosomal locations, or X-chromosomes, or Y-chromosomes are indicated in gray, blue and red squares, respectively. Numbers within the squares show the copy number, if >1, of different genes, e.g., D. melanogaster has two paralogs each of both Prot and tHMG genes. An empty square with a line across it indicates that only a pseudogene can be found in the shared syntenic location, whereas an ‘X’ indicates that no ortholog is found, even though one is expected based on the phylogenomic inference of SNBP age. Based on this analysis, we infer that eight SNBP genes are at least 50 million years old, but only three genes are strictly retained in all 16 species (CG30056, CG31010, and Prot). Indeed, none of the SNBP genes required for male fertility in D. melanogaster are strictly conserved in other Drosophila species, either arising more recently (Mst77F, Prtl99C) or having been lost in at least one species after birth (ddbt). We also marked the montium group species, D. kikkawai, in red, because it has unusually lost six SNBP genes. (B, C) We compared dN/dS (B) or dN (C) values for all orthologous SNBP genes (red dots) in D. melanogaster compared to a histogram of the same values for the genome-wide distribution (gray bars) obtained from an analysis using six species by the 12 Drosophila genomes project (Clark et al., 2007). Our analyses reveal that most SNBP genes are at or beyond the 95th or 99th percentile for dN/dS or dN values (blue dashed lines). The values of CG34269 are calculated using only five species because it is lost in one of the surveyed species, D. ananassae; therefore; we do not show its dN, as it is not comparable to other genes.

Figure 1—figure supplement 1
Expression patterns of sperm nuclear basic protein (SNBP) genes in D. melanogaster spermatogenesis.

Using single-cell expression data from Witt et al., 2021, we estimated SNBP gene expression in each cell type using the NormalizeData function of Seurat (Hao et al., 2021), with a scale factor of 10000. The cell type is assigned by the expression of stage-specific genes in the previous study (Witt et al., 2021).

Figure 1—figure supplement 2
Number and location of high mobility group (HMG) boxes in sperm nuclear basic protein (SNBP) proteins.

We plotted the location of HMG boxes in 15 SNBP proteins encoded by the D. melanogaster genome. Among them, 11 have only one HMG box, whereas 4 of them have two HMG boxes. The location of HMG boxes varies between SNBP proteins. A scale bar for protein size is at the bottom of the figure.

Figure 1—figure supplement 3
Expression patterns of sperm nuclear basic protein (SNBP) genes in Drosophila and Scaptodrosophila species.

We estimated the expression of SNBP orthologs (A) and paralogs (B) using publicly available transcriptome datasets. We used colors to represent expression levels in each sample. Our analyses reveal that almost all SNBP genes are expressed only in testes. The raw values are shown in Supplementary file 2.

Figure 1—figure supplement 4
Sperm nuclear basic protein (SNBP) expression level in testes is correlated across Drosophila species.

We estimated the expression level of each SNBP gene in testes across seven Drosophila species (D. melanogaster, D. simulans, D. yakuba, D. ananassae, D. pseudoobscura, D. virilis, and Scaptodrosophila lebanonensis) and compared the relative expression level of orthologs to each other. The numbers below the diagonal are spearman rho coefficients. Our data suggest a moderate to high correlation between Sophophora species. The raw values are shown in Supplementary file 2.

The strictly retained, highly conserved sperm nuclear basic protein (SNBP) gene, CG30056, is dispensable for male fertility in D. melanogaster.

(A) The SNBP gene, CG30056, is encoded co-directionally in an intron of the essential frazzled gene. Using guide RNAs designed to match sites flanking CG30056, and a healing construct encoding eye-specific DsRed, we created a knockout allele replacing CG30056 with DsRed. The knockout was verified using PCR and primers flanking the CG30056 locus (right). Note that balancer lines encode a wildtype copy of CG30056. (B) We performed fertility assays comparing CG30056 homozygous knockout flies with heterozygous controls, either KO/Balancer or KO/wt (gray ovals). Each dot represents a single replicate, and the average and 95% confidence interval based on standard errors are shown in the figures. Fertility assays were performed either for a few days or to sperm exhaustion (gray ovals). We also assayed fertility of knockout strains for the fertility-essential Mst77F gene, and the fertility-nonessential Tpl94D gene. We also documented the sex ratios of the resulting progeny in (C). Consistent with previous findings, we found that Mst77F knockout males are essentially sterile and Tpl94D knockout males were indistinguishable from their heterozygous controls. We found either no or weak evidence of fertility impairments in two different crosses with homozygous CG30056 knockout males compared to KO/Balancer controls. However, we found no evidence of CG30056 requirement for male fertility in more stringent ‘sperm exhaustion’ fertility experiments compared to KO/wildtype controls (gray ovals). (C) We observed no significant evidence of sex-ratio distortion that would suggest an X-versus-Y meiotic drive in progeny resulting from either CG30056, Mst77F, or Tpl94D knockout males. Although there is suggestive evidence of sex-ratio distortion in progeny of one of the Mst77F genotypes, this is inconsistent between the two crosses and most likely due to stochastic effects of having very few resulting progeny. The raw data of (B) and (C) are shown in Supplementary file 8.

Figure 3 with 2 supplements
Recurrent amplifications of Drosophila sperm nuclear basic protein (SNBP) genes are biased for sex-chromosomal linkage.

(A) Using reciprocal BLAST (see 'Materials and methods'), we searched for homologs of each D. melanogaster SNBP gene in 78 distinct Drosophila species and two outgroup species (shown in dot lines). We depict our findings using the circular phylogram representation for SNBP gene CG31010. The innermost circle is a circular phylogeny of the species (Kim et al., 2021). The next circle ring indicates autosomal copies, with colors to indicate copy number (scale bar, top left; note that scales are different for each gene). Thus, CG31010 is present in one autosomal copy in all but one Drosophila species (gray bar). The third circle indicates sex-chromosomal copies. Red and blue frames in the middle ring indicate X- or Y-linkage if that can be reliably assigned. Dotted frames indicate copies that might not be real orthologs based on phylogeny, whereas solid frames indicate five or more copies. For example, CG31010 is present in five copies on the X-chromosome of D. obscura. The outermost circle shows copies with ambiguous chromosomal location: there are no such copies for CG31010. (B) Using the same representation scheme, we indicate gene retention and amplification for seven other SNBP genes for which we find robust evidence of amplification, from a copy number of five (CG14835) to nearly 50 (tHMG). We also marked the montium group species that lost many SNBP genes with yellow lines. We note that assemblies of Lordiphosa species have lower quality, and the data need to be interpreted carefully. (C) SNBP gene amplifications (five or more copies) are heavily biased for sex chromosomal linkage. Given the relative size of sex chromosomes and autosomes, this pattern is highly non-random (test of proportions, p=2.3e-5).

Figure 3—figure supplement 1
Six sperm nuclear basic protein (SNBP) genes did not undergo significant gene amplification events in Drosophila species.

We searched for homologs of each D. melanogaster SNBP gene in 78 distinct Drosophila species using reciprocal BLAST. We represent our findings using the same circular representation as in Figure 3: the innermost ring indicates autosomal genes, the middle ring indicates sex-linked genes, and the outer ring shows genes with an ambiguous location. In contrast to the significant gene amplification of eight SNBP genes shown in Figure 3, the five SNBP genes represented here only underwent a relatively modest copy number change of twofold or three-fold.

Figure 3—figure supplement 2
Concerted evolution of sperm nuclear basic protein (SNBP) gene amplifications.

Phylogenetic analyses of the eight SNBP genes that underwent gene amplifications reveal that most of these amplifications are evolutionarily young. The phylogeny also suggests concerted evolution among the amplified copies of CG14835 in the D. arawakana clade and Prtl99C in the D. suzukii clade, similar to the amplified tHMG-hetX copies on D. mauritiana and D. simulans (Figure 4). The phylogenies from amplified copies of tHMG and ProtA/B in Lordiphosa species are not shown here because of their low-quality sequences.

Tracing the duplication and amplification of tHMG genes in D.simulans and close relatives.

(A) Using a combination of genome assemblies and phylogenetic analyses, we traced the evolutionary origins and steps that led to the massive amplification of tHMG genes on the D. simulans X chromosome. The first step in this process was the duplication of the ancestral tHMG gene (flanked by CCT1 and Octb1R) on the 3R chromosomal arm to a new location on 3R (tHMG-3R#2 now flanked by CG31468 and Gba1a) and to a location on the X chromosome euchromatin, where tHMG-euX is flanked by CG12691 and CG15572. We infer that this CG12691-tHMG-euX locus then duplicated to another locus in X-heterochromatin, between Atbp and the flamenco locus, and further amplified. These resulting copies experienced different fates in D. simulans and its sibling species. For example, in D. sechellia, tHMG-3R#2, tHMG-euX, and tHMG-hetX were all lost but a degenerated copy of tHMG-3R#2 and flanking genes can be found on its Y chromosome. In contrast, in D. mauritiana, tHMG-3R#2 pseudogenized on 3R, tHMG-euX was retained while tHMG-hetX underwent an amplification to a copy number of 15 tandemly arrayed genes in the X heterochromatin. Finally, in D. simulans, tHMG-3R#2 was completely lost, tHMG-euX was pseudogenized, and tHMG-hetX amplified to a copy number of 15 on the X heterochromatin. We note that the amplification unit sizes are different between D. simulans and D. mauritiana, suggesting that these were independent amplifications. Moreover, we detected different copy numbers (all more than 30) of tHMG-hetX across three sequenced strains of D. simulans we surveyed. This difference is likely due to both incomplete assemblies of this region and strain-specific differences. In addition to this X chromosomal expansion, we also found a few degenerated copies of tHMG on the 3R heterochromatic region and the Y chromosome. (B) The alignment shows the divergence between different tHMG copies in the D. simulans clade and D. melanogaster. Surprisingly, we X-linked tHMG duplicates diverged more from parental genes on autosomes, indicating that they experienced different evolutionary forces than the parental copies. Among 243 aligned nucleotide sites, we found 19 non-synonymous changes and only 3 synonymous changes shared in all X-linked copies after they diverged from the parental copy. Similarly, four non-synonymous changes and no synonymous change occurred on the parental copy in the ancestral species of the simulans clade. Most non-synonymous changes are in the DNA-binding HMG box. As a result, parental copies and new X-linked copies in D. simulans and D. mauritiana only share ~70% protein identity, which is very low given the <3 MY divergence. Our branch test using PAML further shows that both branches have significantly higher protein evolution rates (ω = 1.6, LRT test, p=0.007; Supplementary file 11). However, we did not find evidence of positive selection using a branch-site test (LRT test, p=0.23; Supplementary file 11). (C) Phylogenetic analyses of the various tHMG genes confirm the chronology of events outlined in (A) and find strong evidence of concerted evolution among the amplified tHMG-hetX copies on D. mauritiana and D. simulans, in which copies from the X-linked heterochromatic region are highly homogeneous within species, but diverged between species. For comparison, we showed the species tree on the left, and the phylogeny of three D. simulans clade species is not solved due to lineage sorting and gene flow. To simplify the analysis, we only used sequences that are annotated in NCBI databases.

Evolutionary retention, degeneration, or translocation of sperm nuclear basic protein (SNBP) genes following chromosomal fusions.

SNBP genes are ancestrally encoded on autosomes. Following chromosome fusion over Drosophila evolution, we found eight cases in which three SNBP genes (CG14835, CG34269, and ddbt) became linked to sex chromosomes. In 1/8 cases, SNBP genes translocated back to an autosome. In 2/8 cases, the sex chromosome-linked SNBP genes degenerated despite being otherwise widely conserved in non-montium Drosophila species. In 5/8 cases, SNBP genes were retained on neo-sex chromosomes in 5/8 cases. Among these, we observed one amplification event; ddbt amplified to six copies in D. repletoides. In contrast to sex chromosomal linkage, SNBP genes that remained linked to autosomes despite chromosomal fusions were strictly retained in 16/16 cases. These retention patterns differ significantly between sex chromosomes and autosomes (Fisher’s exact test, p=0.03).

Figure 6 with 1 supplement
A dramatic loss of sperm nuclear basic protein (SNBP) genes coincided with a fusion of X and Y chromosomes in the montium group species.

(A) Using a phylogeny of species from the montium group, we traced the retention or loss of SNBP genes that are otherwise primarily conserved across other Drosophila species. Genes retained in autosomal syntenic locations are indicated in black squares, whereas pseudogenes are indicated by an empty square with a diagonal line. We traced a total of 11 independent pseudogenization events. Three of these pseudogenization events occurred early such that all species from this group have lost CG14835, Mst33A, and tHMG. Three other SNBP genes were lost later (in some cases on multiple occasions) and are, therefore, missing only in a subset of species. For example, we infer that CG34629 was lost on at least five independent occasions (and also in outgroup species D. ananassae). We correlated this dramatic loss of otherwise-conserved SNBP genes with the X-chromosome linkage of genes that are ancestrally Y-linked in other Drosophila species, shown on the right. For example, of 12 Y-chromosomal genes in most related species, including D. melanogaster and D. ananassae, most are now X-linked in montium group species (e.g., 11/12 in D. triauraria, 9/11 in D. jambulina, and 7/10 in D. bocqueti and D. kikkawai). We note these species still harbor a Y chromosome; however, this Y-chromosome lacks most ancestrally Y-linked genes. (B) We traced the chromosomal arrangement and linkage of ancestrally Y-linked genes in D. triauraria using new genome assembly (NCBI accession: GCA_014170315.2) and genetic crosses in (C). We were able to show that the D. triauraria X chromosome represents a fusion of the X chromosome (e.g., from D. melanogaster) and chromosomal segments containing 11 protein-coding genes that are typically found on the Y chromosome (e.g., from D. melanogaster). Genetic crosses confirmed the X-linkage of 9 of these previously Y-linked genes. The lack of allelic differences in D. triauraria prevented us from confirming this for the other two genes: CCY and WDY. (C) An example of the genetic cross used to verify X-linkage. Using genetic crosses between different D. triauraria strains with allelic variation in ancestral Y-linked genes, we evaluated whether male flies inherit these genes maternally, paternally, or from both parents. We observed only maternal inheritance, confirming the X-chromosomal linkage of these genes.

Figure 6—figure supplement 1
Phylogenetic analyses help distinguish between two models of relocation of ancestrally Y-linked genes.

Two hypotheses have been proposed for the relocation of ancestrally Y-linked genes in the montium species group. The first hypothesis proposed by Dupim et al., 2018 posits that the Y-chromosomal genes duplicated onto another chromosome, following which either the Y-linked or non-Y-linked genes were retained. We favor an alternate hypothesis in which all Y-linked genes got fused to the X-chromosome, following which some Y-linked genes relocated back to the Y chromosome in some but not all montium group species. We find strong evidence for the second hypothesis regarding the PRY and Ppr-Y gene, which are both located on the same contig in D. triauraria and D. kikkawai even though they are X-linked in D. triauraria and Y-linked in D. kikkawai. Phylogenetic analyses of PRY suggest that it relocated back to the Y chromosome from the X chromosome in the D. kikkawai lineage. Similarly, the WDY and kl-2 genes are also co-located on the D. triauraria X chromosome and D. kikkawai Y chromosome. However, in this case, the phylogeny is ambiguous enough to prevent us from distinguishing between the two hypotheses for the WDY and kl-2 genes in D. jambulina, D. bocqueti, and D. kikkawai.

Genetic conflict between sex chromosomes may explain the rapid turnover of sperm nuclear basic protein (SNBP) genes in Drosophila species.

SNBP genes are ancestrally encoded on autosomes where we hypothesize that some of them act to suppress meiotic drive between sex chromosomes (e.g., ProtA/B). However, in some cases, paralogs of these SNBP genes duplicate onto sex chromosomes where they undergo dramatic amplification. We propose that this amplification creates an opportunity for them to act as meiotic drive elements themselves (e.g., Dox), imbuing sex chromosomes that inherit them with transmission advantages. A fusion of the sex chromosomes (e.g., D. montium species group) leads to a loss of meiotic competition between sex chromosomes, which will subsequently lead to the loss or degeneration of the suppressing SNBP genes on autosomes since their drive suppression functions are rendered superfluous.


Table 1
McDonald–Kreitman tests for positive selection on sperm nuclear basic protein (SNBP) genes in two Drosophila species.
NameLocation (Mb)LengthpI*# of HMGD. melanogasterD. serrata
Alphaχ2 p-valueAlphaχ2 p-valueExpression stagePhenotypeCitations
CG300562R:12.613711.05 (10.70)1-50.090.750.208UndefinedUndefineda
CG303562R:8.714910.65 (10.89)10.7850.0110.440.226Pre-individualizationUndefinedb,c
CG310103R:30.72544.77 (8.10)10.5350.0340.610.001UndefinedUndefineda
CG342693L:0.519110.7 (10.34)1–0.2630.6130.7360.001UndefinedUndefineda
CG423553L:2.016111.29 (10.89)20.6820.0120.6820.036UndefinedUndefinedb,c
Mst33A2L:11.635910.61 (10.14)20.2080.391NANAUndefinedUndefinedc
ddbt3L:0.311712.3 (11.76)1–0.3160.642–0.3330.647Mature spermSteriled
Mst77F3L:20.821510.34 (9.95)1–0.3080.628NANAMature spermSterilee
Prtl99C2R:29.820111.25 (10.57)20.2420.483NANAMature spermSterilef
Tpl94D2R:23.016411.3 (10.11)20.5710.0230.520.074Pre-individualizationFertileg
CG148353L:7.415210.43 (4.81)10.3320.416NANAMature spermFertilea
ProtA§2R:14.914611.12 (11.52)10.0270.9410.6670.012Mature spermLow fertilitye
ProtB§2R:14.914410.8 (11.60)1–0.0290.9450.9520Mature spermLow fertilitye
tHMG-13R:22.51267.67 (6.11)10.6590.02NANAPre-individualizationFertileb
tHMG-23R:22.51338.94 (7.19)10.4430.199NANAPre-individualizationFertileb
  1. We only show results from unpolarized MK tests using all (including rare) SNPs. Other variations of these results (e.g., polarized, excluding rare SNPs) are shown in Supplementary file 5.

  2. Genes with any evidence of positive selection have p-values in bold.

  3. *

    Isoelectric point of either the whole protein or just HMG domains only (in parentheses).

  4. Post-meiotic protein expression.

  5. A significant signature of positive selection is obtained after removing low-frequency SNPs (<5%) and/or after polarizing changes (see Supplementary file 5).

  6. §

    Independent duplications in two species.

Table 2
Summary of evolution events of Drosophila sperm nuclear basic protein (SNBP) genes.
NamePhenotypeAge (My)Expression level (TPM)*Evolutionary rate (dN/dS)Positive selection(MK test)Amplification event(s)§Number of loss events in 80 speciesNumber of loss events in the montium group(X-Y fusion)
tHMGFertile>65935,11940.39+2X; 1U51
  1. *

    Gene expression level in D. melanogaster testes (Supplementary file 2).

  2. dN/dS in D. melanogaster subgroup species (Supplementary file 4).

  3. Positive selection based on McDonald–Kreitman tests in D. melanogaster and/or D. serrata.

  4. §

    Any specific location with five or more copies of any one SNBP gene. A represents all autosomes combined, X represents the X chromosome, Y represents the Y chromosome, X/Y represents either the X or Y chromosome, and U represents regions with unknown chromosome locations.

  5. Number of potential loss events inferred by the phylogeny (Figure 3 and Figure 3—figure supplement 1). Some of these may represent false negatives due to incomplete genome assemblies.

Additional files

Supplementary file 1

Probability that rapid evolution has obscured homolog detection of young SNBP genes in Drosophila species.

We conducted abSENSE analyses (Weisman et al., 2020) using the blast scores in related species with detected orthologs and inferred likely Blast scores of the orthologs in more related species given the divergence of species. Then we estimated the probability of failing to detect a homolog (if it were present) in species of various divergence levels (using E-value = 1).

Supplementary file 2

Expression levels of SNBP genes in Drosophila and Scaptodrosophila species.

We estimated the expression levels (using TPM) of SNBP genes using publicly available transcriptome datasets of different tissues (Supplementary file 12). The data is also illustrated in Figure 1—figure supplement 3.

Supplementary file 3

Sequence information of SNBPs Drosophila species.

We collected the sequences of SNBPs and their homologs from the NCBI database. We calculated the isoelectric point and length of each protein using Geneious 2022.1.1 (https://www.geneious.com).

Supplementary file 4

Evolutionary rates of SNBP genes in D. melanogaster subgroup species.

We used PAML to estimate evolutionary rates of SNBP genes using the same parameters and the same six Drosophila species used in the 12 Drosophila genomes project (Clark et al., 2007). For comparison, we used the evolutionary rates of other genes from the 12 Drosophila genomes project (Clark et al., 2007).

Supplementary file 5

McDonald–Kreitman test results for SNBP genes in D. melanogaster and D. serrata.

We looked for positive selection in two lineages, D. melanogaster and D. serrata, using McDonald–Kreitman tests to compare within-species polymorphism to between-species divergence (McDonald and Kreitman, 1991). We used D. simulans as the closely related species for the D. melanogaster analysis and D. bunnanda for the D. serrata analysis.

Supplementary file 6

No evidence for positive selection on SNBP genes using the site model in PAML in D. melanogaster subgroup species.

We aligned 9–17 unambiguous orthologs from species in the D. melanogaster group to test whether a subset of sites evolves under positive selection. We compared NSsites models M1a to M2a, and M7 or M8a to M8 using likelihood ratio tests. We ran each model using several codon parameter choices (CodonFreq = 0, 2, 3) to check whether the results were robust. For example, CG30056 shows a signal of difference selection strength across sites using CodonFreq = 2 (p=0.0003), but not CodonFreq = 0 or 3 (p=1 and 0.16, respectively).

Supplementary file 7

Low frequency of inactivating polymorphisms in SNBP genes from D. melanogaster populations.

We extracted population data using an available dataset of >1000 D. melanogaster strains (Hervas et al., 2017; Lack et al., 2016) and long-read assemblies, and documented inactivating mutations in SNBP genes. We found that loss-of-function variants of SNBP genes segregate at very low frequencies (<1%) among D. melanogaster strains. The only exceptions are CG14835 (1.5% frequency of frameshift mutation in worldwide populations), tHMG1 (5.4% frequency of deletion based on long-read assemblies), and ddbt (1.2% frequency of loss of start codon in non-African populations). However, the ddbt mutation is likely to be benign owing to an alternate start codon just a few codons downstream of the canonical start site. In contrast, variants that are not likely to impair function (small in-frame indels) can segregate at higher frequency, e.g., a 15 bp insertion variant of tHMG2 is present at 70% frequency in worldwide D. melanogaster populations. This suggests nearly strict retention of all SNBP genes, whether they were shown to be essential for male fertility in laboratory experiments or not, in all sequenced strains of D. melanogaster.

Supplementary file 8

The fertility assays of SNBP knockout and mutated flies.

We performed fertility assays comparing CG30056 homozygous knockout flies with heterozygous controls. We also assayed fertility of knockout strains for the fertility-essential Mst77F gene, and the fertility-nonessential Tpl94D gene, together. We also documented the sex-ratios of the resulting progeny in Figure 2. Consistent with previous findings, we found that Mst77F knockout males are essentially sterile and Tpl94D knockout males were indistinguishable from their heterozygous controls. We found either no or weak evidence of fertility impairments in three different crosses with homozygous CG30056 knockout males. We observe no significant evidence of sex-ratio distortion that would suggest an X-versus-Y meiotic drive in progeny resulting from either CG30056, Mst77F, or Tpl94D knockout males.

Supplementary file 9

Chromosomal assignments for each contig containing SNBP genes.

We assigned the location of SNBP-containing contigs using synteny (Muller elements) and coverage analysis. We used BUSCO genes on these contigs to assign their most likely location on Muller elements. We also mapped available male Nanopore or Illumina reads to the assemblies and estimated coverage on the contigs compared to autosomal contigs. If the normalized read coverage is significantly less than 1, we assign the contigs to either X or Y chromosome.

Supplementary file 10

The copy number and chromosome location of SNBP homologs using BLAST in each species.

We summarized the data from Supplementary file 8 and also manually curated data from some amplified SNBP genes using extra assemblies or Illumina reads (shown in red). To determine the chromosomal location of some amplified SNBP genes, we mapped male and female Illumina reads from different resources to the assemblies of 10 species (Supplementary file 13). This allowed us to assign contigs to the X or Y chromosome unambiguously. For D. melanogaster and D. simulans, we used assemblies with better contiguities (GCA_000778455.1 [Krsticevic et al., 2015] and GCA_004382185.1 [Chakraborty et al., 2021]).

Supplementary file 11

PAML analyses reveal different selection forces in tHMG duplicates of D. simulans clade species.

We analyzed tHMG copies from D. simulans clade species to infer their selective pressures (Figure 4). We compared branches with different protein evolution rates using likelihood ratio tests (CodonFreq = 2). Our models include the null model (same protein evolution rate across branches), a model where all X-chromosome branches share a rate that is different from the rate on all other branches ('all X'), and a model where the early duplication branches on both the X-linked copies and the parental copy share a rate that is different from all other branches ('Duplication'). We compared two models with all sites that share the same protein evolution rate (fix_omega = 1) and various evolution rates (fix_omega = 0), and did not find evidence of positive selection. The duplication model fits best across all models, so we also used this model to conduct a branch-site test. No evidence for positive selection was using the branch-site test.

Supplementary file 12

Location and degeneration of SNBP genes in species with neo-sex chromosomes.

We report the chromosomal locations of each SNBP gene in species with neo-sex chromosomes illustrated in Figure 5.

Supplementary file 13

Sequence data resources and information used in this study.

Supplementary file 14

Primer sequences used in this study.

MDAR checklist

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Ching-Ho Chang
  2. Isabel Mejia Natividad
  3. Harmit S Malik
Expansion and loss of sperm nuclear basic protein genes in Drosophila correspond with genetic conflicts between sex chromosomes
eLife 12:e85249.