Theoretical expectations for complementary sex determination (CSD).

(A) Cartoon depicting a mating between a diploid female with two different alleles at a sex determination locus and a haploid male bearing one of these two alleles. Half of the sexually produced diploid offspring are expected to develop as males. (B) Table illustrating multi-locus CSD in diploids, in which heterozygosity of at least one sex determination locus is required for female development. By contrast, only homozygosity at all loci results in diploid male development. (C) Cartoon depicting how CSD might work in asexual species such as the clonal raider ant. If diploid males arise from losses of heterozygosity at sex loci, then homozygosity for either of the two alleles should trigger male development. Offspring proportions reflect empirical observations in the clonal raider ant (Kronauer et al. 2012; Oxley et al. 2014).

Whole genome sequencing reveals a candidate sex determination locus on chromosome 4.

(A) Karyoplot depicting the mean CSD index p-value in 50kb windows with a 15kb sliding interval. The CSD index peak is shown as a green dotted line. The significance threshold (p=0.05) and FDR-corrected significance threshold (p=0.002) are indicated by gold and brown lines, respectively. The grey histogram shows the number of ancestrally heterozygous SNPs in 300kb windows. (B) Stacked bar plots for all diploids used for mapping, with one horizontal line for each ordered putatively ancestrally heterozygous SNP on chromosome 4. For each sample, SNPs that retain ancestral heterozygosity are drawn as grey lines, whereas SNPs that have lost ancestral heterozygosity are drawn as black lines.

Number of samples heterozygous and homozygous for each allele in the 46kb region on chromosome 4.

The putative sex determination locus identified in O. biroi is conserved across formicoid ants.

(A) Karyoplot depicting the 14 chromosomes in the O. biroi genome, chromosome 4 from the L. humile genome (Pan et al. 2024), and one contig from linkage group 14 of the V. emeryi reference genome (Miyakawa and Mikheyev 2015). Note that the plots from the different species are not drawn to scale. Homology to protein-coding genes identified within V.emeryiCsdQTL2 is drawn in green. The location of the O. biroi CSD index peak is indicated. (B) A phylogeny of the ant subfamilies (adapted from Borowiec et al. 2019), with the presence of a sex determination locus homologous to the peak of our CSD index, or the absence of data shown. The yellow shaded background denotes the formicoid clade. (C) Karyoplot for O. biroi chromosome 4, depicting homology to V.emeryiCsdQTL2 and ancestral heterozygosity for six different clonal lines (A, B, C, D, I, and M). Grey histograms depict the number of ancestrally heterozygous SNPs in 100kb windows.

The CSD index peak is characterized by high genetic diversity in a non-coding region. (A &

B) Nucleotide diversity across the length of O. biroi chromosome 4 in 5kb windows (step size=1kb) (A) and across the vicinity of the CSD index peak in 100bp windows (step size=20bp) (B). (C) The number of differences per 100bp window between each alternate de novo assembled allele and the reference genome allele. (D) Annotated genes in the vicinity of the CSD index peak. Black boxes depict exons and thin lines depict introns. The lncRNA ANTSR is indicated in bold. Arrows indicate the names of genes in close proximity. The CSD index peak is shown as a green dotted line in A, and with green shading in B-D.

The haplotypes present at the CSD index peak for each studied O. biroi clonal line.

Because no haploid males were sequenced from clonal line C, and only one haploid male was sequenced from clonal line I, the haplotypes for these two lines remain incompletely known. We denote the unidentified haplotypes with question marks.

Number of haploid and diploid males sampled from different O. biroi clonal lines.

Whether a second, tra-containing CSD QTL from V. emeryi is conserved across ants remains ambiguous.

(A) Karyoplot depicting the 14 chromosomes in the O. biroi genome and one contig from linkage group 13 of the V. emeryi reference genome. Homology to protein-coding genes identified within V.emeryiCsdQTL1 is drawn in purple. The locations of the O. biroi CSD index peak and the homolog of transformer are indicated. (B) A phylogeny of the ant subfamilies (adapted from (Borowiec et al. 2019)), with the presence or absence of this second putative sex determination locus, or the absence of data shown. The yellow shaded background denotes the formicoid clade. (C) Karyoplot for O. biroi chromosome 2, depicting homology to V.emeryiCsdQTL1 and ancestral heterozygosity for each clonal line. Grey histograms depict the number of ancestrally heterozygous SNPs in 100kb windows. The region of homology that is ancestrally homozygous in clonal lines A and C is labeled.

Genome-wide heterozygosity levels.

Scatterplot depicting the proportion of base pairs in the ∼220Mb O. biroi genome that are heterozygous in each individual for which we conducted whole genome sequencing. Haploid males have no legitimate heterozygosity—the small elevation above zero is due to heterozygous genotype calls at erroneously assembled regions of the reference genome.

Homozygosity levels differ between males and females at the putative CSD locus.

Manhattan plot depicting FDR-corrected p-values from Fisher’s Exact Tests on whether homozygosity levels differ between males and females at each SNP. The red line indicates the significance threshold (p=0.05), and the vertical green line marks the peak of the CSD index.

Diploid males result from copy-neutral losses of heterozygosity for either allele.

(A) Karyoplot depicting losses (green) or retention (gray) of heterozygosity across the vicinity of the peak of the CSD index in diploid males and females. Codes at the left (C16, C17, etc.) indicate the stock colonies from which each ant was sampled. The magenta box surrounds the CSD index peak. Dark green indicates that the sample has the same allele / haplotype as the reference haploid male whereas light green indicates that the sample has the alternative allele. Switches between alleles in homozygous regions within samples (i.e., changes from light green to dark green or vice versa) represent recombination events that occurred in generations prior to the meiosis that produced each diploid male. Because the reference haploid male was from colony C16, fewer generations have elapsed since the common ancestor of the reference haploid male and the diploid males from that colony, and thus fewer recombination events have accumulated. By contrast, more generations have elapsed since the common ancestor of the reference haploid male and the diploid males from colonies C17, C18, and C1, which is reflected in the higher number of recombination events that accumulated across generations. (B) Normalized read depth across either all SNPs that passed filtering on chromosome 4 (gray, for samples without losses of heterozygosity on chromosome 4) or all SNPs that passed filtering within that sample’s loss of heterozygosity region (green). Box plots show median (center line), interquartile range (IQR) (box limits) and 1.5 x IQR (whiskers). Data points that fall outside 1.5 x IQR are shown individually. Violin plots show the kernel probability density, meaning that the proportion of the data located at an x-axis value is represented by the width of the outlined area.

Genome-wide nucleotide diversity.

Nucleotide diversity across the genome in 5kb windows with a 1kb sliding interval. In the genome-wide plot (top), the red line depicts the peak of the CSD index from genetic mapping. In the zoomed-in plots for chromosomes 4 (middle) and 2 (bottom), the green and purple vertical shading indicates homology to V. emeryi CSD QTL 2 and 1, respectively.

Sex-specific gene expression.

(A) Heatmap of genes that were significantly differentially expressed between sexes (adjusted p-value < 0.05). Each row represents a gene, and each column represents a sample. The trees at the left and the top of the heat map depict hierarchical clustering of genes and samples, respectively. Z-scores of log-transformed expression levels are shown as a color gradient. (B) Volcano plot depicting differential gene expression between males and females. Each gene is represented by a dot, which is colored gray if not differentially expressed, green if the expression difference is at least 2-fold (log ≥ 1), or blue if the expression difference is statistically significant (adjusted p-value < 0.05). Red indicates genes that are significantly differentially expressed and have at least a 2-fold difference in gene expression. (C) Expression levels of genes in the vicinity of the CSD index peak. Normalized counts are shown for each male and female sample, with error bars depicting the mean and SEM for each sex. The genomic coordinates, name, and ID are given for each gene. Genes with significantly different expression between sexes (adjusted p-value < 0.05) are bolded in the table and marked with an asterisk on the plot. The CSD index peak is shown in green, and the dashed box denotes the peak of nuclear diversity.

RNAseq reads aligned to the unannotated lncRNA flanking the putative CSD locus.

RNAseq read depth in the vicinity of the unannotated lncRNA in nine different libraries. The peak of the CSD index is indicated in light green shading, but note that this region extends further downstream. The nucleotide diversity peak is indicated by a horizontal green bar.

The O. biroi transformer homolog is sex-specifically spliced.

The c-terminal region of tra exon four is spliced differently in females (pink) and males (blue), producing a truncated protein in males and a functional, full-length TRA protein in females. (A) Diagram of transformer splicing depicting full-length (functional) and truncated (non-functional) transcripts. The black arrow indicates the transcription start site, the pink line indicates the female-specific splice junction that allows the production of functional protein, and the black line indicates the splice junction that leads to a truncated protein. (B) The number of reads spanning the female splice junction required to produce functional protein from three female and three male short-read RNAseq libraries. (C) The number of full-length and truncated tra transcripts in six long read RNA sequencing (IsoSeq) libraries from different tissues and sexes.

Primers for ploidy assessment via heterozygosity.

5.4 refers to the reference genome version (GCA_003672135.1).

Metadata for all DNA whole-genome shotgun sequencing libraries included in this study.

Question marks indicate male samples for which the clonal line is known, but the stock colony of origin was not recorded.

O. biroi orthologs of genes located in the two V. emeryi CSD QTLs.

Improvements over previous genome annotations, with the new (RU) annotation featuring genes and transcripts not found in previous annotation versions.

Differential gene expression near the CSD index peak.

The log 2-fold change in expression levels between male and female samples, p-values (Wald test), and adjusted p-values (Benjamini-Hochberg adjustment) for 17 genes in the vicinity of the CSD index peak. The p-values are “NA” for LOC105283850 because one of the samples is an extreme outlier detected by Cook’s distance.

Differential gene expression near the CSD index peak.

The log 2-fold change in expression levels between male and female samples, p-values (likelihood ratio test), and adjusted p-values (Benjamini-Hochberg adjustment) for each exon of 17 genes in the vicinity of the CSD index peak. The p-values are “NA” for one of the exons of LOC105285603 because there is negligible expression of that exon (basemean=0).

Versions of software used for genomics analyses.