Repeated losses of PRDM9-directed recombination despite the conservation of PRDM9 across vertebrates

  1. Zachary Baker  Is a corresponding author
  2. Molly Schumer
  3. Yuki Haba
  4. Lisa Bashkirova
  5. Chris Holland
  6. Gil G Rosenthal
  7. Molly Przeworski  Is a corresponding author
  1. Columbia University, United States
  2. Harvard University, United States
  3. Centro de Investigaciones Científicas de las Huastecas 'Aguazarca', Mexico
  4. Texas A&M University, United States
5 figures, 1 table and 5 additional files

Figures

Figure 1 with 5 supplements
Phylogenetic distribution and evolution of PRDM9 orthologs in vertebrates.

Shown are the four domains: KRAB domain (in tan), SSXRD (in white), PR/SET (in light green) and ZF (in gray/dark green; the approximate structure of identified ZFs is also shown). The number of unique species included from each taxon is shown in parenthesis. Complete losses are indicated on the phylogeny by red lightning bolts and partial losses by gray lightning bolts. Lightning bolts are shaded dark when all species in the indicated lineage have experienced the entire loss or same partial loss. Lightning bolts are shaded light when it is only true of a subset of species in the taxon. ZF arrays in dark green denote those taxa in which the ZF shows evidence of rapid evolution. White rectangles indicate cases where we could not determine whether the ZF was present, because of the genome assembly quality. For select taxa, we present the most complete PRDM9 gene found in two examplar species. Within teleost fish, we additionally show a PRDM9 paralog that likely arose before the common ancestor of this taxon; in this case, the number of species observed to have each paralog is in paranthesis. Although the monotremata ZF is shaded gray, it was not included in our analysis of rapid evolution because of its small number of ZFs.

https://doi.org/10.7554/eLife.24133.003
Figure 1—figure supplement 1
Phylogenetic approach to identifying PRDM9 orthologs and related gene families.

A maximum likelihood phylogeny built with RAxML, using an alignment of SET domains, distinguishes between genes that cluster with mammalian PRDM9 and PRDM11 with 100% bootstrap support. Genes shown in black, which are orthologous to both PRDM9 and PRDM11, are only found in jawless fish.

https://doi.org/10.7554/eLife.24133.004
Figure 1—figure supplement 2
Neighbor-joining (NJ) guide tree based on the SET domain.

A NJ guide tree analysis on SET domains identified in our RefSeq, whole genome assembly, and transcriptome datasets was used as an initial step to identify sequences clustering with human PRDM9/7 or PRDM11. These sequences (in red) were selected for phylogenetic analysis with RAxML; they included all RefSeq genes in our dataset that have been previously annotated as PRDM9/7 or PRDM11 (in yellow). Genes more closely related to known PRDM genes other than PRDM9 or PRDM11 (in black) were excluded from further analysis.

https://doi.org/10.7554/eLife.24133.005
Figure 1—figure supplement 3
Expression levels of genes with a known role in meiotic recombination in testes of three exemplar species: human, swordtail fish and bearded dragon (a lizard).

For three swordtails (X. malinche) and one bearded dragon, the FPKM per individual is plotted for each transcript. For humans, the point represents the average expression of 122 individuals from the gene expression atlas (see Materials and methods). For bearded dragons, PRDM9 and RAD50 were represented by multiple transcripts (two and three respectively), and the average expression level is shown. Dashed lines show the point estimate or average expression level of PRDM9 to highlight that several genes in each species have expression levels comparable to or lower than PRDM9 in testes.

https://doi.org/10.7554/eLife.24133.006
Figure 1—figure supplement 4
Amino acid diversity as a function of amino acid position in the ZF alignment for six examplar species.

Each plot shows the 95% range of diversity levels at that site for all C2H2 ZFs from a species of that taxon (gray); the values at PRDM9 are show in red or blue. Turtles, snakes and coelacanth show a pattern of diversity that is similar to those in mammalian species with a complete PRDM9 ortholog, with higher diversity at DNA-binding sites (residues 11, 12, 15 and 18) and reduced diversity at most other sites. In bony fish, this pattern is not observed in PRDM9β genes (blue) or in partial PRDM9α genes (shown for A. mexicanus), where PRDM9 ZF diversity is more typical of other C2H2 ZFs.

https://doi.org/10.7554/eLife.24133.007
Figure 1—figure supplement 5
Examples of differences in computationally predicted PRDM9 binding motifs for species from three taxa.

Shown are two mouse from the same species (Mus musculus subspecies; Genbank: AB844114.1; FJ899852.1), two pythons from the same species (Python bivittatus; the genome sequence and a Sanger resequenced individual; see Materials and methods), and two species of swordtail fish (X. birchmanni and X. malinche; genome sequences). The position weight matrix was obtained using C2H2 prediction tools available at http://zf.princeton.edu.

https://doi.org/10.7554/eLife.24133.008
Figure 2 with 2 supplements
Phylogenetic distribution and functional domains of PRDM9α orthologs in teleost fish and in holostean fish that are outgroups to the PRDM9α/PRDM9β duplication event.

Shown are the four domains: KRAB domain (in tan), SSXRD (in white), PR/SET (in light green) and ZF (in gray/dark green; the approximate structure of identified ZFs is also shown). The number of unique species included from each taxon is shown in parenthesis. Complete losses are indicated on the phylogeny by red lightning bolts and partial losses by gray lightning bolts. Lightning bolts are shaded dark when all species in the indicated lineage have experienced the loss. Lightning bolts are shaded light when it is only true of a subset of species in the taxon. ZF arrays in dark green denote those taxa in which the ZF shows evidence of rapid evolution. White rectangles indicate cases where we could not determine whether the ZF was present, because of the genome assembly quality. While many taxa shown have more than one PRDM9α ortholog, the genes identified from each species generally have similar domain architectures. Exceptions include Clupeiformes, Esociformes, and Holostean fish, for which two alternative forms of PRDM9α paralogs are shown. Based on this distribution, we infer that the common ancestor of ray-finned fish likely had a rapidly evolving and complete PRDM9α ortholog.

https://doi.org/10.7554/eLife.24133.009
Figure 2—figure supplement 1
Section of maximum-likelihood phylogeny of the SET domain showing bony fish PRDM9 orthologs α and β.

The reciprocal monophyly of PRDM9 orthologs α and β is reasonably well supported and in particular bootstrap support for the monophyly of PRDM9α genes is 75%. The ZF domains for representative PRDM9 orthologs of each type are shown to the right, with each gray pentagon indicating the location of a ZF. In swordtail fish, the complete ZF array is found within a single exon, and the last tandem array of six ZFs forms a minisatellite structure. 

https://doi.org/10.7554/eLife.24133.010
Figure 2—figure supplement 2
Analysis of ZF evolution in PRDM9β.

Red lines show the median (solid) and first and third quantiles (dashed lines) for all 48 complete PRDM9 orthologs identified in vertebrates that have four or more ZFs. Blue lines show the median (solid) and first and third quantiles (dashed lines) for all other C2H2 ZF genes from X. maculatus (157 genes). Results about the rate of ZF evolution in the PRDM9β gene from X. maculatus are qualitatively similar regardless of our choice of which cluster of individual ZF domains to include in our analysis, indicating that our ability to detect evidence of positive selection at DNA-binding residues in these arrays, or lack thereof, is unlikely to be influenced by this choice.

https://doi.org/10.7554/eLife.24133.011
Substitutions at SET domain catalytic residues in bony fish PRDM9 genes.

(a) Lineages within bony fish carrying substitutions at each of three tyrosine residues involved in H3K4me3 catalysis in human PRDM9 are shown in blue, yellow and red. (b) Lineages carrying substitutions at one, two or three of these residues are shown in red, pink and blue respectively. All PRDM9β genes as well as a partial PRDM9 ortholog from holostean fish carry one or more substitutions at these residues. The PRDM9β gene from Xiphophorus is indicated by the presence of asterisk.

https://doi.org/10.7554/eLife.24133.013
Figure 4 with 5 supplements
Patterns of recombination and PRDM9 evolution in swordtail fish.

(a) The ZF array of PRDM9 appears to be evolving slowly in Xiphophorus, with few changes over 1 million years of divergence (Cui et al., 2013; Jones et al., 2013). (b) PRDM9 is upregulated in the germline relative to the liver in X. birchmanni (circles) and X. malinche (squares; panel shows three biological replicates for each species). (c) The computationally-predicted PRDM9 binding sites is not unusually associated with H3K4me3 peaks in testes (d) Crossover rates increase near H3K4me3 peaks in testis (e) Crossover rates increase near CGIs (f) Crossover rates do not increase near computationally-predicted PRDM9 binding sites (see Figure 4—figure supplement 3 for comparison). Crossover rates were estimated from ancestry switchpoints in naturally occurring hybrids between X. birchmanni and X. malinche (see Materials and methods).

https://doi.org/10.7554/eLife.24133.014
Figure 4—figure supplement 1
Expression levels of meiosis-related genes in swordtail fish tissues.

In general, the seven meiosis-related genes surveyed had higher expression in tissues containing germline cells than liver tissue, but this pattern was much more pronounced in testis tissue (compared to ovary tissue). As a result, we focused our analysis of meiosis related genes on RNAseq data generated from testis. Results shown are based on analysis of three male and female biological replicates from each swordtail species (X. birchmanni and X. malinche).

https://doi.org/10.7554/eLife.24133.015
Figure 4—figure supplement 2
Recombination frequency in swordtails as a function of distance to the TSS.

Partial correlation analyses suggest that the association between the TSS and recombination rate in swordtails is explained by H3K4me3 peaks and CGIs.

https://doi.org/10.7554/eLife.24133.016
Figure 4—figure supplement 3
Recombination rates show a peak near the computationally predicted PRDM9A binding motif in humans and gor-1 allele in gorillas.

Most work investigating relationships between PRDM9 motifs and recombination rates have focused on the PRDM9 motif empirically inferred from recombination hotspots, but the empirical motif is unknown for many species. To generate results comparable to those we present for swordtails in Figure 4F, we therefore determined recombination rate (using the map based on LD patterns in the CEU; Frazer et al., 2007) as a function of distance to computationally predicted binding sites for the PRDM9A motif in humans and as a function of distance to computationally predicted binding sites for the gor-1 PRDM9 allele (Schwartz et al., 2014) in gorillas (using the LD-based map from Great Ape Genome Project et al., 2016).

https://doi.org/10.7554/eLife.24133.017
Figure 4—figure supplement 4
Higher observed recombination rate at testis-specific H3K4me3 peaks than liver-specific H3K4me3 peaks.

H3K4me3 peaks found only in the testis and not in the liver of X. birchmanni have higher observed recombination rates in X. birchmanni – X. malinche hybrids. This pattern supports the conclusion that H3K4me3 peaks are associated with recombination in swordtails.

https://doi.org/10.7554/eLife.24133.018
Figure 4—figure supplement 5
MEME prediction of sequences enriched in testis-H3K4me3 peaks relative to liver-specific H3K4me3 peaks.

Results shown in A-E are from five replicate runs of 2000 testis-specific sequences using liver-specific sequences as a background comparison set. The swordtail computationally-predicted PRDM9 binding motif is shown for comparison.

https://doi.org/10.7554/eLife.24133.019
Figure 5 with 1 supplement
Patterns of recombination near TSSs and CGIs in species with and without complete PRDM9 orthologs.

For each species, recombination rates were binned in 10 kb windows along the genome; curves were fit using gaussian loess smoothing. The fold change in recombination rates shown on the y-axis is relative to recombination rates at the last point shown. Species shown in the top row have complete PRDM9 orthologs (mouse, human, gorilla and sheep), whereas species in the bottom row have no PRDM9 ortholog (dog, zebra finch, long-tailed finch), or a partial PRDM9 ortholog (swordtail fish).

https://doi.org/10.7554/eLife.24133.020
Figure 5—figure supplement 1
Dependence of patterns of recombination near TSSs and CGIs in dog and human on the type of genetic map.

(a) Recombination rates near the TSS and CGI in dogs are shown using recombination maps inferred either from LD patterns or pedigree data. The magnitude of the peak near these features is lower in the map with lower resolution. This observation raises the possibility that a higher resolution map in swordtail fish would result in a higher peak near these features. (b) Recombination rates near the TSS and CGI in humans are shown using recombination maps inferred either from LD patterns or ancestry switches in African-American samples. Recombination rates near the TSS and CGI in human do not seem to be strongly influenced by the choice of genetic map, though peaks at these features are slightly reduced in admixture- and pedigree-based methods.

https://doi.org/10.7554/eLife.24133.021

Tables

Table 1

Evolution of the ZF in PRDM9 orthologs with different domain architectures. PRDM9 orthologs for which an empirical comparison dataset is available are ordered by their domain structures: from the top, we present cases of complete PRDM9 orthologs with KRAB-SSXRD-SET domains; partial orthologs putatively lacking KRAB or SSXRD domains or partial orthologs lacking both; then those containing only the SET domain. A row is shaded green if the ZF is in the top 5% most rapidly evolving C2H2 ZF in the species, as summarized by the proportion of amino-acid diversity at DNA-binding sites, and is blue if it is ranked first. A complete PRDM9 ortholog from dolphins (Balaenoptera acuforostrata scammoni) is shaded in gray because there is no amino acid diversity between ZFs of the tandem array. The empirical rank is also shown, as are the number of PRDM9 orthologs identified in the species. Asterisks indicate PRDM9 orthologs known to play a role in directing recombination. For PRDM9 genes from teleost fish, under major group, we additionally indicate whether or not the gene is a PRDM9α or PRDM9β gene.

https://doi.org/10.7554/eLife.24133.012
OrganismMajor groupPRDM9 structureProportion AA
diversity at
DNA-binding sites
RankNumber of
PRDM9 genes
from species
Number of ZF
genes evaluated
from species
Balaenoptera acutorostrata scammoniplacentalKRAB-SSXRD-SETNANA1272
Bison bison bisonplacentalKRAB-SSXRD-SET0.66711285
Bos taurus* (chr1)placentalKRAB-SSXRD-SET0.68413313
Bos taurus (chrX)placentalKRAB-SSXRD-SET0.41463313
Bos taurus* (chrX)placentalKRAB-SSXRD-SET0.41473313
Bubalus bubalisplacentalKRAB-SSXRD-SET0.66711268
Chelonia mydasturtleKRAB-SSXRD-SET0.414111235
Chlorocebus sabaeusplacentalKRAB-SSXRD-SET0.50011344
Chrysemys picta belliiturtleKRAB-SSXRD-SET0.47811308
Cricetulus griseusplacentalKRAB-SSXRD-SET0.78131259
Dasypus novemcinctusplacentalKRAB-SSXRD-SET0.61411289
Dipodomys ordiiplacentalKRAB-SSXRD-SET0.56711194
Esox luciusteleost fish (α)KRAB-SSXRD-SET0.45514234
Fukomys damarensisplacentalKRAB-SSXRD-SET0.43031227
Homo sapiens*placentalKRAB-SSXRD-SET0.68711357
Latimeria chalumnaecoelacanthKRAB-SSXRD-SET0.54521227
Loxodonta africanaplacentalKRAB-SSXRD-SET0.61711381
Macaca fascicularisplacentalKRAB-SSXRD-SET0.68011364
Macaca mulattaplacentalKRAB-SSXRD-SET0.64511366
Marmota marmota marmotaplacentalKRAB-SSXRD-SET0.48311277
Microcebus murinusplacentalKRAB-SSXRD-SET1.00011326
Mus musculus*placentalKRAB-SSXRD-SET0.91011224
Nannospalax galiliplacentalKRAB-SSXRD-SET1.00011307
Octodon degusplacentalKRAB-SSXRD-SET0.33353227
Octodon degusplacentalKRAB-SSXRD-SET0.33163227
Ovis ariesplacentalKRAB-SSXRD-SET0.61512252
Ovis ariesplacentalKRAB-SSXRD-SET0.39842252
Ovis aries musimonplacentalKRAB-SSXRD-SET0.353121285
Papio anubisplacentalKRAB-SSXRD-SET0.58511404
Pelodiscus sinensisturtleKRAB-SSXRD-SET0.69211221
Peromyscus maniculatus bairdiiplacentalKRAB-SSXRD-SET1.00011243
Protobothrops mucrosquamatussquamataKRAB-SSXRD-SET0.46251195
Python bivittatussquamataKRAB-SSXRD-SET0.57111206
Rattus norvegicusplacentalKRAB-SSXRD-SET0.57011255
Rousettus aegyptiacusplacentalKRAB-SSXRD-SET0.74211258
Salmo salarteleost fish (α)KRAB-SSXRD-SET0.53894510
Salmo salarteleost fish (α)KRAB-SSXRD-SET0.500114510
Sus scrofaplacentalKRAB-SSXRD-SET0.54211248
Thamnophis sirtalissquamataKRAB-SSXRD-SET0.45931179
Tupaia chinensisplacentalKRAB-SSXRD-SET1.00011249
Tursiops truncatusplacentalKRAB-SSXRD-SET0.93911233
Myotis lucifugusplacentalSSXRD-SET0.52412308
Myotis lucifugusplacentalSSXRD-SET0.310682308
Octodon degusplacentalSSXRD-SET0.282463227
Sarcophilus harrisiimarsupialSSXRD-SET0.2242772344
Callorhinchus milliicartilaginous fishKRAB-SET0.3146163
Astyanax mexicanusteleost fish (α)SET0.258602158
Astyanax mexicanusteleost fish (β)SET0.1671522158
Clupea harengusteleost fish (α)SET0.27964118
Clupea harengusteleost fish (α)SET0.27874118
Clupea harengusteleost fish (α)SET0.274104118
Clupea harengusteleost fish (β)SET0.1581144118
Cynoglossus semilaevisteleost fish (β)SET0.182801107
Danio rerioteleost fish (β)SET0.1793451367
Esox luciusteleost fish (α)SET0.295324234
Esox luciusteleost fish (β)SET0.1921764234
Esox luciusteleost fish (β)SET0.1921774234
Fundulus heteroclitusteleost fish (β)SET0.1891581206
Haplochromis burtoniteleost fish (β)SET0.1801481168
Ictalurus punctatusteleost fish (α)SET0.320148140
Ictalurus punctatusteleost fish (α)SET0.319158140
Ictalurus punctatusteleost fish (α)SET0.306248140
Ictalurus punctatusteleost fish (α)SET0.303258140
Ictalurus punctatusteleost fish (α)SET0.286338140
Ictalurus punctatusteleost fish (α)SET0.276398140
Ictalurus punctatusteleost fish (α)SET0.253558140
Ictalurus punctatusteleost fish (β)SET0.1791278140
Larimichthys croceateleost fish (β)SET0.192701115
Lepisosteus oculatusholostei fishSET0.223481106
Maylandia zebrateleost fish (β)SET0.1731611176
Neolamprologus bricharditeleost fish (β)SET0.1731411152
Nothobranchius furzeriteleost fish (β)SET0.1802451266
Notothenia coriicepsteleost fish (β)SET0.16783187
Oreochromis niloticusteleost fish (β)SET0.1731731190
Oryzias latipesteleost fish (β)SET0.2131041191
Otolemur garnettiiplacentalSET0.2661211285
Poecilia formosateleost fish (β)SET0.1911841242
Poecilia latipinnateleost fish (β)SET0.1911751235
Poecilia mexicanateleost fish (β)SET0.1911871244
Poecilia reticulatateleost fish (β)SET0.1911621212
Pundamilia nyerereiteleost fish (β)SET0.1731341147
Pygocentrus nattereriteleost fish (α)SET0.331122142
Pygocentrus nattereriteleost fish (β)SET0.1791242142
Salmo salarteleost fish (β)SET0.1884114510
Salmo salarteleost fish (β)SET0.1804544510
Sinocyclocheilus anshuiensisteleost fish (β)SET0.1852242284
Sinocyclocheilus anshuiensisteleost fish (β)SET0.1852252284
Sinocyclocheilus grahamiteleost fish (β)SET0.1852111271
Sinocyclocheilus rhinocerousteleost fish (β)SET0.1852082269
Sinocyclocheilus rhinocerousteleost fish (β)SET0.1852092269
Takifugu rubripesteleost fish (β)SET0.18866198
Xiphophorus maculatusteleost fish (β)SET0.1911171158

Additional files

Supplementary file 1

(A) PRDM9 orthologs identified in RefSeq and whole genome databases.

Includes which amino acids align to each of three catalytic tyrosine residues of the human PRDM9 SET domain for each PRDM9 ortholog. (B) Genomes targeted for the PRDM9 search. Major groups or individual species lacking PRDM9 in RefSeq were targeted for further analysis of their whole genome sequences, with the exception of previously reported bird and crocodilian losses. Species included and results of this search are reported here.

https://doi.org/10.7554/eLife.24133.022
Supplementary file 2

(A) Accession numbers and assembly descriptions of publicly available testes RNAseq samples used for de novo assembly and assessment of PRDM9 expression.

N50 describes the shortest contig length in which 50% of the assembled transcriptome is contained. (B) Summary of expression results of PRDM9 in the testis in representative species from major taxa. Only species that passed the core recombination protein quality test (see Materials and methods, Supplementary file 2D) are included in this table, with the exception of cases, indicated with asterisks, in which PRDM9 was detected but one or more conserved recombination proteins were not. (C) Results of a rpsblast search of assembled transcriptomes and a reciprocal best blast test to PRMD9. Domain structures found in transcripts that blasted to PRDM9 for each species are also listed. (D) Results of the core recombination protein test for each species for which a transcriptome was assembled. Blue shading indicates that a reciprocal best blast test did not identify the gene in the transcriptome.

https://doi.org/10.7554/eLife.24133.023
Supplementary file 3

(A) Rates of amino acid evolution in SET domains of representative PRDM9 orthologs lacking other functional domains.

To determine whether PRDM9 orthologs lacking functional domains are non-functional, we compared rates of evolution between each PRDM9 ortholog missing a domain and another sequence (listed here) with the complete domain structure. The number of aligned bases and the results of a likelihood ratio test of non-neutral versus neutral evolution are also shown. See Methods for details. (B) Amino acid diversity levels of PRDM9 ZF arrays and the proportion localized to known DNA-binding residues. Columns labeled V1-V28 indicate the amount of amino acid diversity observed at each amino acid in the ZF array. For each gene, we also report the ranking of this proportion relative to all other C2H2 ZF genes from the same species, when such a ranking was feasible. This table additionally includes the average percent DNA identity between ZFs used in our analysis of rapid evolution. (C) Results of the likelihood ratio test of neutral versus not non-neutral evolution along the SET domain of mammalian PRDM9 orthologs lacking a KRAB or SSXRD domain, as annotated in RefSeq (see Materials and methods). We also indicate whether another annotated ortholog exists with a KRAB domain.

https://doi.org/10.7554/eLife.24133.024
Supplementary file 4

R script to convert GenPept/GenBank files for RefSeq genes into table format.

https://doi.org/10.7554/eLife.24133.025
Supplementary file 5

Shell script to perform reciprocal best blast search of transcripts from de novo assembly of testis transcriptomes.

https://doi.org/10.7554/eLife.24133.026

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Zachary Baker
  2. Molly Schumer
  3. Yuki Haba
  4. Lisa Bashkirova
  5. Chris Holland
  6. Gil G Rosenthal
  7. Molly Przeworski
(2017)
Repeated losses of PRDM9-directed recombination despite the conservation of PRDM9 across vertebrates
eLife 6:e24133.
https://doi.org/10.7554/eLife.24133