Derived male reproductive characteristics in gorillas.

A. Correlation between testis weight and body weight in primates. Inset bar chart shows testis mass as a percentage of body mass for gorilla, common chimpanzee, and bonobo. Adapted from (Harcourt et al., 1981).

B. Correlation between erect penis length and body weight in primates. Adapted from (Dixson, n.d.)

C. Diagram of testis histology in gorilla and other primates. Note that gorillas have fewer seminiferous tubules (T) and more interstitial tissue (I) than other primates. Adapted from (Fujii-Hanamoto et al., 2011b).

D. Correlation between the sperm midpiece volume and residual testis size (relative testis size accounting for body mass). Adapted from (Anderson and Dixson, 2002).

E. Boxplots showing the distribution of sperm swimming speeds (VCL, µms−1) and escape forces (Fesc, pN). Adapted from (Nascimento et al., 2008).

F. Proportion of morphologically abnormal sperm in Hominids. Adapted from (Seuanez et al., 1977b).

Episodic adaptive evolution and relaxed selection intensity on protein-coding genes in gorillas.

A. Consensus phylogeny of 261 mammals from which protein-coding gene alignments were generated. The ape (Hominoidea) part of the tree is expanded, and the ancestral Homininae node used to reconstruct ancestral sequences is labeled. Branch lengths are scaled to the number of amino acid substitutions in each branch (see inset branch scale).

B. The ABSREL model detects episodic adaptive evolution. Under the null model of no positive selection, a gene can have one or more classes of sites with dN/dS ≤ 1. When a gene experiences an episode of positive selection, most sites still have dN/dS ≤ 1 but also includes an additional rate class with dN/dS > 1; positive selection is inferred with a gene includes a class of sites with dN/dS > 1 with a likelihood ratio test (LRT) P≤0.05.

C. Volcano plot showing the dN/dS value (Log2) and likelihood ratio test (LRT) P-value (–Log10) for the ABSREL test in the gorilla lineage. 96 genes were identified with 2 rate classes where one had dN/dS > 1 (LRT P≤0.05), 55 genes were identified with 2 rate classes where one had dN/dS < 1 (LRT P≤0.05), the remaining 13,336 genes had one rate class with dN/dS<1.

D. Stacked bar chart showing the proportion of genes with dN/dS >1 (LRT P≤0.05) in the gorilla lineage. Pie charts show the proportion of genes with dN/dS >1 (LRT P≤0.05) in gorilla that are also dN/dS >1 (LRT P≤0.05) in at least one other lineage; 26 genes have evidence for positive selection only in the gorilla lineage.

E. The RELAX model detects intensified and relaxed selection. When selection is relaxed, dN/dS rates move toward 1 and/or the proportion of sites increases in rate classes with dN/dS values close to 1. In contrast, intensified selection drives dN/dS rates away from neutrality to more extreme values in the foreground than background lineages. Each branch in the RELAX model includes a selection intensity parameter (K), which scales the distribution of dN/dS rate categories (ω1k, ω2k, ω3k) such that genes with K<1 are inferred to evolve under relaxed selection intensity, and genes with K>1 are inferred to evolve under enhanced selection intensity.

F. Volcano plot showing the K-value (Log2) and likelihood ratio test (LRT) P-value (–Log10) for the RELAX test in the gorilla lineage. 578 genes were identified with K<1 and 144 with K>1 (LRT P≤0.05); the remaining 12,769 genes had K≈1. The 25 genes with the most extreme K<1 and P-values and the two genes with the most extreme K>1 values are labeled.

G. Stacked bar chart showing the proportion of genes with K<1, K>1, and K≈1 in the gorilla lineage. Pie charts show the proportion of genes with K>1 in gorilla that are also K≠1 in at least one other lineage (upper) or K<1 in gorilla that are also K≠1 in at least one other lineage (lower). Blue, the proportion of genes with K<1 in at least one other lineage. Red, proportion of genes with K>1 in at least one other lineage. Purple, the proportion of genes with K<1 and K>1 in other lineages.

Figure 1 – source data 1. Genomes used in selection tests.

Figure 1 – source data 2. List of positively selected genes from ABSREL.

Figure 1 – source data 3. List of K<1 genes from RELAX.

Figure 1 – source data 4. List of K>1 genes from RELAX.

Amino acid substitutions in gorilla relaxed genes are likely deleterious.

A. Volcano plot showing the dN/dS (Log2) and likelihood ratio test (LRT) P-value (–Log10) for codons with gorilla-specific amino acid changes in genes with K≠1 in gorilla. dN/dS rates and P-values were estimated with the fixed effect likelihood (FEL) model, the P-value is for the test of dN/dS ≠ 1 at each codon. Sites with dN/dS ≠ 1 at P≤0.05 in K<1 genes are shown in blue and K>1 genes in red.

B. Bubble plot showing the percent of each amino acid substitution in the gorilla lineage, colored according to the BLOSUM 250 score for that transition. Note that, unlike amino acid substitution matrices, this matrix is not symmetric because the direction of amino acid substitution is inferred from the ancestral reconstruction.

C. Stripchart showing the distribution of PolyPhen-2 scores for gorilla-specific amino acid changes in genes with K≈1 (12,769 genes), K<1 (578 genes), and K>1 (144 genes). PolyPhen-2 scores for each amino acid change are shown as jittered dots, a summary of the data for each gene set is shown as boxplot with the mean score indicated, and a violin plot reflecting the distribution of PolyPhen-2 scores. The number of amino acid substitutions is given in parenthesis for each gene set. The range of scores corresponding to benign, possibly damaging, and probably damaging mutations is highlighted. Multiple hypotheses (Holm) corrected p values (pHolm-adj.) are shown for comparisons of K<1 and K>1 to K≈1, and K>1 to K>1.

Gorilla relaxed genes are preferentially expressed from meiosis onwards and are enriched in sperm-related functions.

A. Word cloud showing gene ontology (GO) biology process terms for which the gorilla K<1 genes are enriched. Note that only terms related to reproduction, meiosis, or sperm biology are shown. Inset size scale shows word size for an enrichment P-value of 0.01, words are colored according to their enrichment ratios.

B. Bar chart showing GO cellular component terms in which genes with K<1 in gorilla are enriched. Only terms related to the cytoskeleton, cilia, centrosomes, and septin ring complexes are shown. Terms are colored according to their enrichment ratios (scale shown in panel A).

C. Enrichment of genes with K<1 (blue) and K>1 (red) in the prostate, based on human scRNA-seq datasets (WebCSEA).

D. Testis cell types in which the gorilla K≠1 genes are enriched. Left, UMAP plot of single-cell RNA-Seq data generated from gorilla testis. Major cell types are colored and named, the arrow indicates direction of sperm differentiation. Right, volcano plot showing the enrichment ratio (Log2) and hypergeometric P-value (–Log10) for the expression of genes with K≠1 in gorilla testis cell types. Cell types with statistically significant (FDR q-value≤0.10) enrichment or depletion of the K≠1 gene set are labeled (blue for enrichment of genes with K<1 and red for enrichment of genes with K>1). Inset diagram shows enrichment ratios for the expression of K<1 genes in each cell type, ordered by direction of germ cell differentiation (bottom to top). Bar widths are scaled according to the enrichment ratio in the gorilla lineage; the background gray box indicates a ratio of 1. Boxes and cell-type labels are colored according to cell population labels in the UMAP plot. Labels are in bold if the expression of K<1 genes is statistically enriched or depleted in that cell type (FDR q-value≤0.10).

E. Enrichment ratio of proteins encoded by the K<1 (blue) and the K>1 (red) genes in different fractions of the sperm cell proteome (FDR q-value≤0.10).

Figure 3 – source data 1. Enriched GO terms.

Gorilla relaxed genes with probably damaging amino acid substitutions are associated with multiple sperm abnormalities.

A. Word cloud showing GO biological process and molecular function terms in which genes with K<1 and probably damaging substitutions are enriched. Note that only terms related to reproduction, meiosis, or sperm biology are shown. Inset size scale shows word size for an enrichment P-value of 0.01, phenotypes are colored according to their enrichment ratios.

B. Word cloud showing the top 40 mouse knockout phenotypes in which genes with K<1 and probably damaging substitutions are enriched. Inset size scale shows word size for an enrichment P-value of 0.01, phenotypes are colored according to their enrichment ratios.

C. Testis cell types in which the expression of gorilla K<1 genes with probably damaging amino acid substitutions are enriched or depleted. Bar chart shows the enrichment ratio for each cell-type, ordered by direction of germ cell differentiation (bottom to top). Bar widths are scaled according to the enrichment ratio; the background gray box indicates a ratio of 1. Boxes and cell-type labels are colored according to cell population labels in the UMAP plot in Figure 2 panel D, labels are bold if the expression of genes with K<1 is statistically enriched or depleted in that cell-type at FDR q-value≤0.10.

D. Enrichment ratio of proteins encoded by the K<1 genes with probably damaging amino acid changes in different fractions of the sperm cell proteome (FDR q-value≤0.10).

Relaxed section intensity identifies new spermatogenesis genes.

A. Drosophila in vivo germ cell-specific RNAi screen uncovers new spermatogenic functions for orthologs of the gorilla K<1 genes. Silencing of 156 Drosophila orthologs expressed in the testis with no previous association with male fertility/spermatogenesis was induced at the onset of the mitosis-to-meiosis transition using the bam-GAL4 driver. Results reflect a total of four independent experiments (mean±standard deviation). Threshold for impaired reproductive fitness (red horizontal line) corresponds to a 75% fertility rate or >2 standard deviations of the mean observed in negative (-ve) controls (in black). The red data point corresponds to the positive (+ve) control (ribosomal protein L3). Testicular phenotypes of the 43 hits were defined by phase-contrast microscopy and assigned to four color-coded classes based on the earliest manifestation of the phenotype (see B and C).

B. Representative images of the four phenotypical classes of spermatogenic impairment defined in the Drosophila RNAi screen (see A). Phase-contrast microscopy of testis segments or of mature male gametes. Pre-meiotic phenotypes were characterized by a smaller testis with severely compromised germ cell growth and differentiation. Meiotic phenotypes were defined by an accumulation of spermatocytes (arrowheads) as well as by the lack of post-meiotic stages (arrows indicate elongating spermatid bundles). Post-meiotic (I) phenotypes corresponded to severe spermiogenesis defects, leading to irregular or altogether collapsed spermatid bundles (asterisk). Post-meiotic (II) phenotypes were characterized by a substantial reduction in the amount of mature male gametes observed, without any overt defects in spermiogenesis. For simplicity, the Drosophila gene names refer to the human ortholog (dIARS1 is CG11471, dGPAA1 is CG3033, dCDK12 is CG7597, and dAGK is CG31873). Scale bars: 20 μm.

C. Half (61/116) of all gorilla K<1 genes involved in male fertility are functionally required for the post-meiotic stages of spermatogenesis (spermatid development and/or mature gamete function). Data include results from the Drosophila RNAi screen (n= 41; see A) and published evidence on human, mice and fruit fly orthologs (n= 75; see Methods). Phenotype classes follow the same color code as in A, with the addition of a class (“Other”) corresponding to non-spermatogenic phenotypes reported in the literature (either unspecified or of somatic nature).

Gorilla relaxed genes are associated with human male infertility.

A. Burden of putative loss-of-function (LoF) and missense variants for human orthologs of gorilla K<1 and K≈1 genes in the MERGE cohort of infertile men. Data are shown as Fisher’s exact test (FET) P-values for enrichment of LoF and missense variants in the MERGE cohort compared to gnomAD as proxy for a cohort of fertile men.

B. QQ-plot of expected versus observed LoF FET P-values in the MERGE cohort.

C. Gene-level constraint metrics for human orthologs of gorilla K<1 and K≈1 genes, including the probability of loss-of-function intolerance (pLI), and missense and synonymous Z-scores from gnomAD data. pLI scores closer to one indicate more intolerance to protein-truncating variation, whereas higher (more positive) Z-scores indicate more intolerance to variation. Statistics for each comparison are given above each panel.

Gorilla relaxed genes associated with abnormal sperm biology and infertility in humans.

PolyPhen-2 predictions of gorilla amino acid substitutions (probably damaging | possibly damaging | benign, or NA for unable to determine), K and P-values from the RELAX test, and associated human phenotypes are shown for each gene.

Patterns of selection acting on human and ape orthologs of gorilla K<1 genes. A.

ABC-MK test results from 1000 Genomes Project. The rate of adaptation (alpha) as a function of derived allele frequency for human orthologs of K<1 (blue line) and K≈1 genes (gray liner with 95% CI).

B. UpSet plot showing the intersection between gorilla K<1 and K>1 genes and genes inferred to be under positive and balancing selection for the ELS, HKA, FWH, standard MK, and Z tests. The inset tree shows the Hominoid phylogeny and number of individuals used for tests from (Cagan et al., 2016). The total number of genes in each set, overlap with K<1 and K>1, expected overlap, enrichment ratio, hypergeometric P-value, and FDR q-value for enrichment of K<1 genes in each gene set are shown in a table.

Genomic features of genes with K<1 (blue) compared to all genes tested for relaxed selection intensity (red). P-values derived from a Chi-square test.