Research Article

Developmental Biology

A widely employed germ cell marker is an ancient disordered protein with reproductive functions in diverse eukaryotes

Whitehead Institute, United States
University of Massachusetts Medical School, United States
Howard Hughes Medical Institute, United States
Children's Hospital Oakland, United States
Massachusetts Institute of Technology, United States
University of Kansas Medical Center, United States
University of South Florida, United States

Oct 8, 2016

https://doi.org/10.7554/eLife.19993

Open access
Copyright information

Abstract
eLife digest
Introduction
Results
Discussion
Materials and methods
References
Article and author information
Metrics

Abstract

The advent of sexual reproduction and the evolution of a dedicated germline in multicellular organisms are critical landmarks in eukaryotic evolution. We report an ancient family of GCNA (germ cell nuclear antigen) proteins that arose in the earliest eukaryotes, and feature a rapidly evolving intrinsically disordered region (IDR). Phylogenetic analysis reveals that GCNA proteins emerged before the major eukaryotic lineages diverged; GCNA predates the origin of a dedicated germline by a billion years. Gcna gene expression is enriched in reproductive cells across eukarya – either just prior to or during meiosis in single-celled eukaryotes, and in stem cells and germ cells of diverse multicellular animals. Studies of Gcna-mutant C. elegans and mice indicate that GCNA has functioned in reproduction for at least 600 million years. Homology to IDR-containing proteins implicated in DNA damage repair suggests that GCNA proteins may protect the genomic integrity of cells carrying a heritable genome.

https://doi.org/10.7554/eLife.19993.001

eLife digest

Sex allows two individuals of a species to combine their genetic information and generate new offspring. As such, and unlike most of the cells that make up an organism, reproductive cells carry DNA that will be passed to the next generation. To prepare for sexual reproduction, reproductive cells undergo a process whereby they give up half of their DNA. Moreover, in many species, reproductive cells undergo dramatic changes in size and shape to become specialized as eggs or sperm.

For two decades, research scientists have made use of an antibody that specifically targets and labels reproductive cells in mice and rats, long before they become recognizable as sperm or eggs. However, nothing was known about the protein that this antibody recognizes.

Now, Carmell et al. have found that the antibody recognizes a protein called GCNA. This protein was previously thought to be limited to mice and rats. However, GCNA is in fact a member of a far-flung family of proteins that had gone unnoticed by researchers due to their unusual composition and large unstructured regions. Carmell et al. show that this protein family extends beyond rodents and even mammals, and actually exists in all branches of eukaryotic life, including plants and single-celled organisms. In most species examined, the genes encoding these proteins are most active in reproductive cells.

The most important next step is to figure out precisely what GCNA proteins are doing in reproductive cells. This will also hopefully explain why these proteins appear to have been conserved for more than 2 billion years, throughout the entire history of eukaryotes.

https://doi.org/10.7554/eLife.19993.002

Introduction

Sexual reproduction in eukaryotes is accomplished through meiosis, a specialized cell cycle that generates genetically diverse derivatives. Meiosis is believed to have evolved once, in the common ancestor of extant eukaryotes, about two billion years ago (Ramesh et al., 2005; Villeneuve and Hillers, 2001). At the same time, special protection for heritable genomes evolved in the form of an RNAi pathway that likely functioned in defense against transposons and viruses (Cerutti and Casas-Mollano, 2006; Shabalina and Koonin, 2008). Sexual reproduction via meiosis and utilization of specialized genome defense mechanisms are characteristics of many unicellular eukaryotes, in which every cell contributes to the next generation. These properties became germline specific when the distinction between germ cells and somatic cells was made in multicellular organisms, with the germline being tasked with guarding the heritable genome. Accordingly, certain critical genes encoding proteins associated with the sexual cycle, as well as those required for integrity of the heritable genome, share a deep, common evolutionary origin (Uanschou et al., 2007; Kumar et al., 2010 ; Cerutti and Casas-Mollano, 2006). In metazoans, this reproductive assemblage expanded to encompass proteins not directly involved in meiosis but nonetheless expressed exclusively in germ cells; these include the RNA binding proteins DAZL/BOULE, VASA, and NANOS (Ewen-Campen et al., 2010; Juliano et al., 2010).

Mouse is the most widely used model to study the germline in mammals. Across 22 years of research, and >425 publications in mouse reproductive biology, investigators have employed antibodies to two antigens – GCNA1 and TRA98 – to distinguish germ cells from somatic cells. Nonetheless, the identity of the antigens themselves has remained unknown. The striking qualities of these markers, together with their importance to the research community, led us to attempt to discover the underlying antigens. Here, we identify the antigen recognized by both antibodies as a single, unannotated protein in the mouse. We name this protein GCNA and describe its ancient and ubiquitous association with sexual reproduction.

Results

GCNA1 and TRA98 antibodies recognize the same previously unannotated protein

The mouse germline is first distinguished from the surrounding soma as a population of primordial germ cells that migrates toward the developing genital ridge. Upon arrival, germ cells undergo a critical transition and commit to sexual differentiation, eventually giving rise to oocytes or spermatozoa. The handful of proteins that are expressed concomitant with entry into the genital ridge include the classical germ cell factors DAZL and DDX4 (Mouse Vasa Homolog; MVH) and the antigens recognized by GCNA1 and TRA98 antibodies (Figure 1A) (Hu et al., 2015; Enders and May, 1994; Tanaka et al., 2000).

Figure 1 with 2 supplements see all

Download asset Open asset

Identification of the antigen recognized by GCNA1 and TRA98 antibodies.

(A) Schematic illustration depicting isolation of a mouse embryonic day 11.5 gonad. GCNA (white, pink in merge) is specifically expressed by germ cells (green) only after they enter the somatic gonad (G), which is marked by its expression of GATA4 (blue). Dashed line indicates boundary of gonad. Oct4-EGFP transgene labels germ cells. (B) Mass spectrometry indicates that GCNA1 and TRA98 antibodies immunoprecipitate the same protein. (C) GCNA coats condensed chromosomes during meiotic prophase. One plane through a single nucleus of a male spermatocyte in the zygotene phase of Meiosis I is shown. (D) Mouse GCNA encodes an acidic and repetitive protein that is predicted to be disordered. Intrinsically disordered regions (IDRs) are depicted in green; repeating units of the indicated sequences are shown in dark green and non-repetitive disordered sequences are light green. Human GCNA shares a region with the same acidic and repetitive character (green) and contains additional, ordered domains – a protease domain (blue), and a C2C2 zinc finger (purple). SUMO-interacting motifs (SIMs) are indicated by black bars. (E) Bacterially expressed recombinant HIS-tagged mouse GCNA is recognized by both GCNA1 and TRA98 antibodies. The antigen is located within a protein fragment containing the murine-specific GE(P/M/S)E(S/T)EAK repeat. (F) Western blot of mouse XY embryonic stem cell lysate with a disruption in the identified locus demonstrates depletion of both GCNA1 and TRA98 antigens.

https://doi.org/10.7554/eLife.19993.003

Figure 1—source data 1 Mass spectrometry data.: https://doi.org/10.7554/eLife.19993.004
Download elife-19993-fig1-data1-v3.xlsx

The GCNA1 and TRA98 monoclonal antibodies, generated independently from rats immunized with cell lysates from adult mouse testis, are robust markers of mouse germ cell nuclei and show no reactivity to somatic cells (Enders and May, 1994; Tanaka et al., 2000). To clone the GCNA1 antigen, we carried out immunoprecipitation from an adult mouse testis lysate, followed by mass spectrometry. We detected 26 unique peptides representing 51% coverage of an unannotated protein specifically in the immunoprecipitate, enabling us to confidently identify it as GCNA (Figure 1B, Figure 1—source data 1). Mouse GCNA contains four distinct repeat classes that comprise the majority of the protein, and its theoretical isoelectric point of 4.17 makes it more acidic than 98.9% of all mouse proteins (Figure 1D, Figure 1—figure supplement 1) (Bjellqvist et al., 1993).

The developmental timing and cell type specificity of labeling with GCNA1 resembles that of TRA98, a second antibody with an unknown antigen (Tanaka et al., 2000; Inoue et al., 2011). The subcellular localization of GCNA1 and TRA98 also show striking similarities; we find that GCNA forms a distinctive coating around condensed chromosomes in meiotic prophase (Figure 1C), and TRA98 has been noted to have a similar reticular or netlike localization in the nucleus (Inoue et al., 2011). Due to these parallels, we hypothesized that the TRA98 antibody recognized the same antigen as GCNA1. Indeed, immunoprecipitation using TRA98 yielded 24% coverage of the GCNA protein (Figure 1B, Figure 1—source data 1). By expressing portions of mouse GCNA in bacteria, we determined that both antibodies recognize a fragment containing a murine-specific 8-amino-acid tandem GE(P/M/S)E(S/T)EAK repeat that occurs 25 times in the protein (Figure 1D,E). Additionally, we disrupted the gene encoding GCNA in mouse embryonic stem (ES) cells (Figure 1—figure supplement 2) and found that antigens recognized by both antibodies were depleted, confirming that GCNA1 and TRA98 antibodies recognize the same protein (Figure 1F).

Mouse GCNA is predicted to be entirely disordered

The repetitive structure and biased amino acid composition of mouse GCNA is characteristic of intrinsically disordered protein regions (IDRs). IDRs display conformational flexibility and have no single, well-defined equilibrium structure, and yet carry out numerous biological activities (van der Lee et al., 2014). IDRs have high absolute net charge due to enrichment for disorder-promoting (charged and polar) amino acids, and low net hydrophobicity due to depletion of hydrophobic and order-promoting amino acids, features that make it possible to predict disordered regions from primary amino acid sequence alone (Uversky et al., 2000; He et al., 2009). Based on its extreme negative charge and atypical amino acid composition, mouse GCNA is predicted to be entirely disordered (Figure 1—figure supplement 1, Figure 2A).

Figure 2

Download asset Open asset

GCNA proteins across eukarya are predicted to have large intrinsically disordered regions.

(A) Disorder tendency of GCNA proteins from mouse (navy), human (medium blue), *Drosophila melanogaster* (light blue), and *Dictyostelium discoideum* (gray). Residues above the dotted line are predicted to be disordered. Folded domains are indicated by gray rectangle. (B) Charge-hydropathy analysis (mean scaled hydropathy, <H>, against absolute mean net charge, <R>) predicts GCNA IDRs to be disordered by comparison to a set of known disordered proteins (orange squares) and ordered proteins (blue triangles). IDRs of 114 GCNA proteins across eukarya (black circles) are plotted. The boundary between unfolded and folded space is empirically defined by the equation <H>b = (<R> + 1.151)/2.785 (Uversky et al., 2000). (C) Amino acid composition of GCNA IDRs. Enrichment or depletion is expressed as (C_x − C_order)/C_order, representing the normalized excess of a given residue’s content in a query dataset (C_x) relative to the corresponding value in the dataset of ordered proteins (C_order). Error bars represent fractional differences of the standard deviations of observed relative frequencies of bootstrapped samples of the datasets.

https://doi.org/10.7554/eLife.19993.007

Figure 2—source data 1 Charge/hydropathy analysis.: https://doi.org/10.7554/eLife.19993.008
Download elife-19993-fig2-data1-v3.xlsx

GCNA orthologs containing an IDR and a unique combination of conserved structured domains are present in every eukaryotic superkingdom

Alignment-based searches using the amino acid sequence of mouse GCNA as bait failed to detect significant similarity to any protein in any species, suggesting that GCNA might be unique to mice. Alternatively, GCNA could be rapidly evolving, as is common for disordered proteins due to their low level of structural constraint (Brown et al., 2010; Huntley and Golding, 2000; Tompa, 2003). To distinguish between these scenarios, we sought to identify GCNA orthologs in other species.

Using a combination of methods that accommodate rapid evolution, we were able to identify GCNA orthologs in a broad swath of organisms from human to the most primitive single-celled eukaryotes. Because rapid evolution often renders IDRs un-alignable even among closely related species (Brown et al., 2010), primary sequence cannot be used to establish orthology. Therefore, starting with mouse GCNA, which is encoded by a gene on the X chromosome, we identified orthologs in human and other vertebrates by synteny (Figure 3). We discovered that, whereas mouse GCNA is predicted to be entirely disordered, many GCNA orthologs, including the human ortholog, have well-conserved structured domains in addition to an IDR (Figure 1D, 2A). Specifically, we deduced that three domains had been lost more than 25 million years ago in the rodent lineage (Figure 3—figure supplement 1). Non-transcribed pseudo-exons that previously encoded these domains are found in the mouse genome, allowing us to confidently place mouse GCNA into this larger family (Figure 3—figure supplement 1).

Figure 3 with 2 supplements see all

Download asset Open asset

Synteny mapping of *Gcna* orthologs across vertebrates.

Comparison of the genomic region containing *Gcna* in mouse (GRCm38/mm10) with syntenic regions in human (GRCh38/hg38), rabbit (Broad/oryCun2), opossum (Broad/monDom5), chicken (ICGSC Gallus_gallus-4.0/galGal4), zebrafish (Zv9/danRer7), and elephant shark (Callorinchus_milii-6.1.3/calMil1). *Gcna* orthologs are indicated in green.

https://doi.org/10.7554/eLife.19993.009

To explore the phylogenetic reach of the GCNA family, we used the conserved regions in a core set of vertebrate GCNA proteins to identify more distant orthologs by sequence similarity. We created statistical profiles (hidden Markov models or HMMs) of the sequence of each conserved region (Figure 4—figure supplement 1). We found that each region has a unique signature found only in GCNA orthologs, allowing us to unambiguously identify such orthologs in a diverse array of eukaryotic species, more than two hundred in all (Figure 4—source data 1).

Interestingly, even though we discovered GCNA orthologs using alignable structured domains, our amino acid charge/hydropathy analysis reveals that, almost invariably, the proteins have a large non-alignable IDR at their N-terminus (Figure 2A,B). Despite a lack of primary sequence conservation, the overall amino acid composition of IDRs is often conserved in orthologous disordered protein families, as it is not the exact sequence of amino acids, but the overall character of the IDR – such as an abundance of residues that may be post-translationally modified, net charge, and amino acid enrichments and depletions – that confers function (van der Lee et al., 2014). Indeed, a comparison of mouse and human GCNA proteins showed that, while they were not readily aligned using conventional approaches, they do share highly repetitive and acidic IDRs (Figure 1D) (Nolte et al., 2001). A survey of the IDR composition of over 100 GCNA proteins across eukarya revealed that GCNA IDRs have characteristic amino acid enrichments and depletions – some shared with proteins in a disordered dataset (DisProt) and others specific to the GCNA family (Figure 2C). In keeping with another common IDR paradigm, GCNA proteins across species, including mouse and human, share short, linear protein-protein interaction motifs embedded within a larger disordered context (Figure 1D, Figure 3—figure supplement 2) (van der Lee et al., 2014; Davey et al., 2015).

GCNA orthologs are present in every eukaryotic superkingdom (Figure 4A) (Simpson and Roger, 2004). Although our gene discovery is enriched for animal and fungal orthologs due to the availability of sequenced genomes and transcriptomes, GCNA orthologs are present in the deepest known branches of eukaryotes. With certain exceptions including plants and protostomes such as drosophilids, each species contains one Gcna gene. Our analyses of these orthologs allowed us to deduce that a typical GCNA protein has four components: a large IDR, a zinc metalloprotease domain, a C2C2 zinc finger, and a non-canonical two-helix HMG box (Figure 4B, Figure 4—figure supplement 1). This four-domain architecture is generally conserved in GCNA proteins, although in some species, including mouse, one or more domains have been lost through modular protein evolution, bringing remarkable variation to the GCNA family (Figure 4B, Figure 4—figure supplements 2 and 3) (Moore et al., 2013). Taken together, our analyses lead us to infer that GCNA proteins were present in the common ancestor of eukaryotes, more than 2 billion years ago.

Figure 4 with 3 supplements see all

Download asset Open asset

GCNA proteins have origins in the earliest eukaryotes and are associated with reproduction.

(A) Phylogenetic tree of eukaryotes (topology of the tree is drawn as in [He et al., 2014], with estimated divergence times obtained from the TimeTree database [Hedges et al., 2006]). GCNA proteins are present in all major groups of eukaryotes regardless of the nature of their sexual life cycle – whether haplontic (haploid with a diploid phase that undergoes meiosis), diplontic (a diploid that undergoes meiosis to produce gametes), or haplodiplontic (alternates between haploid and diploid phases of the life cycle). (B) Domain composition of GCNA orthologs. Of note, *S. pombe* does not have an IDR, but a related fission yeast, *S. japonicus,* does (not shown). (C) Black circles indicate a given GCNA ortholog has enriched expression in reproductive cells or tissues or is upregulated during the sexual cycle. Organisms for which expression data do not exist are indicated by UNK (unknown).

https://doi.org/10.7554/eLife.19993.012

Figure 4—source data 1 List of GCNA family members.: https://doi.org/10.7554/eLife.19993.013
Download elife-19993-fig4-data1-v3.xlsx
Figure 4—source data 2 Reproductive expression of GCNA family members.: https://doi.org/10.7554/eLife.19993.014
Download elife-19993-fig4-data2-v3.xlsx

Of particular note are the GCNA proteins found in higher plants, where the protease domain has been lost. The YABBY proteins, a higher-plant-specific family whose evolutionary origins have long been sought (Bartholmes et al., 2012), share with other GCNA proteins the combination of a C2C2 zinc finger, an IDR, and a characteristic two-helix HMG box (Figure 4—figure supplement 2) (Bowman and Smyth, 1999). Based on our phylogenetic analyses of HMG boxes (Figure 4—figure supplement 3), we propose that YABBY proteins are GCNA family members that have diverged in the 600 million years since the last common ancestor of algae (where canonical four-domain GCNA proteins are still found) and higher plants.

Across eukarya, GCNA is enriched in cells carrying the heritable genome

Since GCNA is a highly specific marker of both premeiotic and meiotic germ cells in the mouse, we sought to determine whether GCNA is associated with cells carrying the heritable genome either before or during the sexual cycle in other species by examining expression of GCNA homologs across eukarya, including fungi, plants, and the most basal single-celled eukaryotes (Figure 4—source data 2).

Gcna is expressed during the sexual cycle in single-celled haploid fungi and green algae, organisms that are more than a billion years removed from each other but share a similar life cycle. In S. pombe, transcription of Gcna is 17-fold upregulated during meiosis relative to vegetative cells (Mata et al., 2002). Likewise, in Chlamydomonas, a single-celled green alga, Gcna is upregulated 17-fold in activated gametes relative to vegetative cells (Ning et al., 2013).

In animals that generate germ cells from adult multipotent cells, Gcna is expressed more broadly in two cell types that have gametogenic potential and thus carry the heritable genome – the multipotent stem cells and the germ cells derived from them. In the cnidarian Hydra, germ cells are derived from interstitial stem cells, where Gcna expression is upregulated compared with two exclusively somatic stem cell lineages (Littlefield, 1985; Bosch et al., 2010; Hemmrich et al., 2012). In the planarian flatworm Schmidtea mediterranea, and the acoel Hofstenia miamia, Gcna is upregulated in neoblasts, the totipotent adult stem cell populations from which germ cells arise (van Wolfswinkel et al., 2014; Srivastava et al., 2014). Transcripts of this gene are also enriched in the testes in a sexual strain of Schmidtea (Figure 5A).

Figure 5 with 1 supplement see all

Download asset Open asset

GCNA RNA and protein are enriched in germ cells of a variety of model organisms.

(A) Germ cells residing in testes (circled) of a sexual strain of the planarian flatworm *Schmidtea mediterranea* are enriched for *Gcna* transcript (red), as well as that of germ cell marker *Nanos* (green). (B) Germ cells in pachytene stage of meiosis in nematode *C. elegans* express GCNA protein (red). Nuclei stained with DAPI (blue). (C) Pole cells, the earliest germ cells in a *Drosophila melanogaster* embryo, are enriched for transcript of *Gcna* ortholog CG14814. Image reproduced with permission (Lécuyer et al., 2007). GCNA protein is expressed in germ cells in cross sections of mouse (D) and human (E) seminiferous tubules, the site of meiosis and sperm production. (D) Pre-meiotic germ cells in testis of 1-day old mouse are labeled using antibodies recognizing GCNA (red) and DDX4/MVH (green). (E) Antibody recognizing human GCNA labels nuclei of germ cells (brown). Red squares indicate origin of structures and/or cells depicted below each diagram.

https://doi.org/10.7554/eLife.19993.019

We also found enrichment of Gcna in germ cells of animals that specify their germline during embryonic development (Extavour and Akam, 2003). Gcna is enriched in the germlines of animals such as C. elegans, Drosophila, zebrafish, and frog, which segregate their germlines in the first cell division of the embryo due to maternally inherited factors (Figure 5B,C, Figure 5—figure supplement 1, Figure 4—source data 2) (Tomancak et al., 2002; Graveley et al., 2011; Reinke et al., 2004; Robinson et al., 2013; Kohara and Shin-i, 1998). Likewise, mouse, which specifies germ cells later in the embryo through inductive signals, shows a 13-fold upregulation of Gcna transcript in the testis compared to somatic tissues (Figure 4—source data 2). Transcripts in somatic tissues are apparently not translated, as mouse GCNA protein is germ-cell specific (Maatouk et al., 2006). An array of other mammals shows similar upregulation in the testis (Figure 5—figure supplement 1). Like mouse GCNA protein, the human ortholog is localized to the nucleus of germ cells (Figure 5D,E).

In summary, we found that expression of Gcna genes is enriched in cells carrying a heritable genome – either just prior to or during meiosis in single-celled eukaryotes, in stem cells and germ cells of animals without a dedicated germline, and in germ cells of organisms with a dedicated germline (Figure 4C, Figure 4—source data 2). Gcna expression across all of these modalities indicates that Gcna was present, and was likely enriched in cells carrying a heritable genome, in the last common ancestor of all extant eukaryotes.

C. elegans gcna-1 mutants present reproductive phenotypes

Given the extraordinary association of Gcna expression with reproductive cells and tissues across eukarya, we set out to test whether this expression translates to conservation of reproductive function. The only GCNA family member whose function had been probed previously is the C. elegans gene ZK328.4, which we have named gcna-1. C. elegans gcna-1 had been depleted as part of an RNAi screen of germline-enriched genes (Colaiácovo et al., 2002). gcna-1 was reported to have a low penetrance HIM (High Incidence of Males) phenotype, suggestive of a defect in meiotic chromosome segregation, as XO male progeny result from nondisjunction in XX hermaphrodites. We decided to further investigate gcna-1 function in C. elegans by creating two independent mutant alleles (Figure 6—figure supplement 1). Worms carrying either of the gcna-1 mutant alleles have small but significant reductions in brood size at 25 degrees Celsius (Figure 6A). Additionally, both alleles exhibit a significant HIM phenotype, substantiating a reproductive function for GCNA in C. elegans (Figure 6B).

Figure 6 with 1 supplement see all

Download asset Open asset

*C. elegans gcna-1* mutants present reproductive phenotypes.

(A) Two alleles of *gcna-1* in *C. elegans* exhibit significant reduction in brood size at 25 degrees Celsius. Twenty-four broods were counted for each allele at each temperature. Each symbol represents the number of progeny derived from a single hermaphrodite. Outliers are shown and no data was excluded. Bars indicate mean +/- standard deviation. P-values were calculated with two-tailed non-parametric Mann-Whitney test. (B) *gcna-1* mutants also exhibit a high incidence of male progeny at both 20 and 25 degrees Celsius. Twenty-four broods were counted and scored for each allele at each temperature. P-values were calculated using a chi-squared test with correction for multiple testing.

https://doi.org/10.7554/eLife.19993.021

C. elegans gcna-1 encodes a canonical GCNA protein with an IDR, protease domain, zinc finger, and HMG box. Even in the face of other domains being lost over evolutionary time, the large IDR is retained in GCNA proteins almost without exception. To understand the persistence of this rapidly evolving domain across billions of years, we sought to probe the functional role of the IDR in GCNA proteins. Mouse GCNA presented us with the unique opportunity to probe the function of the IDR in isolation, as it has an IDR that has existed for 25 million years in the absence of structured domains.

The entirely disordered mouse GCNA is required for male fertility

To this end, we created a targeted conditional allele of Gcna in the mouse (Figure 1—figure supplement 2). We found that mutating Gcna dramatically impairs male fertility, a testament to the importance of the IDR; 10 of 11 Gcna-mutant males tested sired no offspring, while wild-type male controls sired 5 to 24 offspring (Figure 7A). The epididymal duct, which contains copious amounts of maturing sperm in wildtype mice, was nearly devoid of sperm in the mutant (Figure 7B). Gcna-mutant female mice were fertile (data not shown). Sex-specific infertility is a common phenomenon in mice, and is thought to be due to different levels of checkpoint control in males and females (Matzuk and Lamb, 2002; Morelli and Cohen, 2005; Hunt and Hassold, 2002).

Figure 7

Download asset Open asset

*Gcna* is required for male fertility in mice.

(A) *Gcna*-mutant male mice exhibit marked subfertility, and most are sterile. Each datapoint represents the number of litters (A) and pups (B) sired by a single male (n=15 for wildtype, n=11 for *Gcna*^DeltaEx4/Y). P-values were calculated using a two-sided Wilcoxon rank sum test. (B) Epididymal ducts in *Gcna*-mutant males are largely devoid of sperm in the lumen when compared to those of wildtype mice.

https://doi.org/10.7554/eLife.19993.023

GCNA is homologous to two protein families involved in replication-associated DNA repair

To gain insight into potential molecular functions of GCNA that might underlie these phenotypes, we extended our molecular evolution analyses beyond the GCNA family to other proteins with similar domain composition. Through phylogenetic analysis as well as structural modeling, we determined that GCNA proteins are members of a small family of IDR-containing metalloproteases that includes only two other members, Wss1 and Spartan, both of which have essential functions in DNA repair coupled to DNA replication (Mosbech et al., 2012; Centore et al., 2012; Davis et al., 2012; Juhasz et al., 2012; Machida et al., 2012; Kim et al., 2013; Stingele et al., 2014; Balakirev et al., 2015). GCNA, Wss1, and Spartan proteins share three common elements: minigluzincin protease domains, large intrinsically disordered regions (IDRs), and motifs for binding ubiquitin family proteins (particularly SUMO interacting motifs, or SIMs).

Wss1 and Spartan proteins have recently been recognized as members of the same protease family (Stingele et al., 2015). Our phylogenetic analyses based on amino acid sequence alignments of Wss1, Spartan, and GCNA protease domains show that GCNA proteases are also part of this small and distinctive group (Figure 8A). Secondary structure prediction and 3D modeling place all three proteins within the minigluzincin protease family (Figure 8B) (Balakirev et al., 2015; López-Pelegrín et al., 2013). We conducted HMM to HMM comparison across more than 14,800 PfamA curated protein families, including over 200 protease families, using alignment and secondary structure scoring. We found that GCNA, Spartan, and Wss1 proteases are more closely related to each other than they are to any other protein family (Figure 8—figure supplement 1).

Figure 8 with 1 supplement see all

Download asset Open asset

GCNA, Wss1, and Spartan proteins comprise a family of proteases.

(A) Maximum likelihood phylogenetic tree showing relationships among eukaryotic GCNA, Wss1, Spartan, and bacterial SprT protease domains. GCNA, Wss1, and Spartan are present across eukarya, including the most primitive eukaryotes, except that Wss1 proteins have been lost in animals. Gray and black circles indicate nodes with bootstrap values greater than 600 and 900 (out of 1000), respectively. Tree is based on alignment of the protease domains and was created with PhyML. (B) Structural modeling of GCNA, Wss1, and Spartan protease domains places GCNA in the minigluzincin family of proteases along with Wss1 and Spartan. Pairwise protein structure comparison using FATCAT (Ye and Godzik, 2004) detected strong structural similarity, as evidenced by small root-mean-square deviations (RMSDs), which are measures of the average distance (in Angstroms) between the carbon atoms of the superimposed proteins. All structures were found to be significantly similar.

https://doi.org/10.7554/eLife.19993.024

Cementing the relationship between these three proteins is a second common element shared by GCNA, Wss1, and Spartan – a large disordered domain. Like those in GCNA, the Wss1 and Spartan IDRs are present across a large swath of eukarya (Figure 9A). IDRs are well known to function in the flexible display of short linear motifs that are required for protein-protein interactions (van der Lee et al., 2014). These short motifs represent small islands of conservation within IDRs. Among these short motifs, SUMO interacting motifs (SIMs) constitute the third common element shared by GCNA, Wss1, and Spartan (Figure 9B, Figure 3—figure supplement 2).

Figure 9

Download asset Open asset

GCNA, Wss1, and Spartan proteins have SUMO interacting motifs within large disordered regions.

(A) Spartan and Wss1 have significant disordered regions outside of their protease domains. Residues above the dotted line are predicted to be disordered. (B) GCNA proteins contain conserved SUMO-interacting motifs (SIMs) that are homologous to those found in Spartan and Wss1 proteins. Residues are colored as follows: blue (acidic), green (hydroxyl, sulfhydryl, amine), red (small and hydrophobic), pink (basic).

https://doi.org/10.7554/eLife.19993.026

Discussion

In summary, we find that an antigen widely employed to identify germ cells in mice corresponds to a protein that has been expressed for nearly 2 billion years in cells carrying a heritable genome – whether single cells during meiosis, stem cells that give rise to germ cells, or germ cells themselves. We provide evidence that GCNA functions in reproduction in two organisms – C. elegans and mouse – whose last common ancestor lived about 600 million years ago (Cartwright and Collins, 2007; Parfrey et al., 2011; Peterson et al., 2008), and suggest that GCNA may have a function in cells carrying a heritable genome across eukarya.

The GCNA protein features, and in mice consists entirely of, an intrinsicially disordered region with an unusual and rapidly evolving amino acid sequence that has kept this protein hidden from researchers until now. Protein disorder is ubiquitously present in eukaryotic proteomes; greater than 30% of proteins have disordered segments of 30 or more consecutive residues (Dunker et al., 2000). IDRs, rather than having a single, well-defined fold, dynamically interconvert between an ensemble of conformations separated by low energy barriers (Chebaro et al., 2015). IDRs therefore challenge the expectations set forth in the structure-function paradigm, which posits that the function of a protein is dependent upon its three-dimensional structure (Wright and Dyson, 1999). IDRs have critical biological functions across a wide variety of cellular processes including transcription, regulatory, and signaling pathways, and it has been proposed that IDRs support the processes that underlie the development of multicellular organisms (Xie et al., 2007; Tantos et al., 2012; Dunker et al., 2015).

Male sterility caused by mutation of the entirely disordered mouse GCNA indicates that the function of GCNA proteins lies at least in part in the disordered region. While the molecular functions of GCNA IDRs remain to be elucidated, prior studies of other IDRs (van der Lee et al., 2014) and specifically the known roles of the IDRs in the homologous proteins Wss1 and Spartan inform our thinking about GCNA IDR functions. Homology with Wss1 and Spartan proteins suggests that GCNA IDRs function in molecular recognition mediated by SUMO interacting motifs, thereby recruiting or scaffolding larger protein complexes. GCNA IDRs may also contribute to nucleic acid binding, either alone or in combination with the zinc finger and HMG box.

GCNA may be among the first of many disordered protein families to be implicated in reproduction, as sex chromosomes, which have been shown to be enriched for genes encoding germ cell genes, are also enriched for disordered proteins (Mueller et al., 2013; Hegyi and Tompa, 2012). Spermatogenesis and meiosis are in fact among the biological processes most strongly correlated with predicted protein disorder (Dunker et al., 2015).

Conservation of structured domains in the GCNA family over a vast evolutionary timescale suggests that significant function likely also derives from these domains. We have identified both sequence and structural homology to Wss1 and Spartan, encompassing the protease domain, IDR, and SUMO interacting motifs. Wss1 and Spartan protease domains and SUMO interacting motifs are essential for response to DNA damage caused by ionizing radiation, UV light, and chemical crosslinkers (Centore et al., 2012; Stingele et al., 2014). GCNA shares these common elements. By extrapolation, we propose that the GCNA protein family identified here may also function in maintenance of genome integrity, and has become specialized for cells whose genomes will be passed to the next generation. As such, GCNA may help cells cope with the additional DNA damage burden that accompanies meiosis, a process that involves extensive, programmed DNA damage (Keeney, 2008).

While at first glance the phenotypes of mouse and C. elegans Gcna mutants seem disparate, they are consistent with defects in DNA damage repair in both organisms. The phenotype of Gcna mutant mice is similar to those of other genes involved in DNA damage repair, including Rad18 and Rad6, which function in the same DNA damage tolerance pathway as Spartan (Inagaki, 2011; Roest et al., 1996; Hedglin and Benkovic, 2015). Likewise, the HIM phenotype in Gcna-deficient C. elegans is consistent with a defect in genome maintenance (Billi et al., 2014); in C. elegans as well as mammals, efficient and proper repair of meiotic double strand breaks is crucial for accurate chromosome segregation (Baudat et al., 2013; Youds and Boulton, 2011).

GCNA joins a well studied cohort of genes, including Dazl/Boule, Vasa, Nanos, and Piwi, known for their function in metazoan germ cells. Many of these genes are proposed to be part of a conserved germline multipotency program (GMP) that functions in multipotent cells as well as in the germline, where they have a broad role in establishing and maintaining pluripotency (Ewen-Campen et al., 2010; Juliano et al., 2010). Unlike most of these genes (Eirín-López and Ausió, 2011; Kerner et al., 2011), GCNA arose prior to the divergence of the major eukaryotic lineages. GCNA is more ancient than all but Piwi and associated proteins that function in small-RNA-mediated genome defense (Kerner et al., 2011; Swarts et al., 2014), and predates the origin of a dedicated metazoan germline by a billion years. We believe that GCNA’s presence in the last common ancestor of extant eukaryotes indicates that GCNA’s role is more fundamental than even the ancestral pluripotency module and the GMP that arose from it. Like Piwi, GCNA proteins may perform a role that is critical for reproductive cells by helping to maintain the integrity and stability of heritable genomes across the generations.

Share this article

Cite this article

Identification of the antigen recognized by GCNA1 and TRA98 antibodies.

Figure 1—source data 1

GCNA proteins across eukarya are predicted to have large intrinsically disordered regions.

Figure 2—source data 1

Synteny mapping of Gcna orthologs across vertebrates.

GCNA proteins have origins in the earliest eukaryotes and are associated with reproduction.

Figure 4—source data 1

Figure 4—source data 2

GCNA RNA and protein are enriched in germ cells of a variety of model organisms.

C. elegans gcna-1 mutants present reproductive phenotypes.

Gcna is required for male fertility in mice.

GCNA, Wss1, and Spartan proteins comprise a family of proteases.

GCNA, Wss1, and Spartan proteins have SUMO interacting motifs within large disordered regions.

Author details

Michelle A Carmell

Contribution

For correspondence

Competing interests

Gregoriy A Dokshin

Contribution

Competing interests

Helen Skaletsky

Contribution

Competing interests

Yueh-Chiang Hu

Present address

Contribution

Competing interests

Josien C van Wolfswinkel

Present address

Contribution

Competing interests

Kyomi J Igarashi

Contribution

Competing interests

Daniel W Bellott

Contribution

Competing interests

Michael Nefedov

Present address

Contribution

Competing interests

Peter W Reddien

Contribution

Competing interests

George C Enders

Contribution

Competing interests

Vladimir N Uversky

Contribution

Competing interests

Craig C Mello

Contribution

Competing interests

David C Page

Contribution

For correspondence

Competing interests

Citations by DOI

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

Categories and tags

Research organisms