Evolution of Yin and Yang isoforms of a chromatin remodeling subunit precedes the creation of two genes

Abstract
Introduction
Discussion
Materials and methods
Data availability
References
Article and author information
Metrics

Abstract

Genes can encode multiple isoforms, broadening their functions and providing a molecular substrate to evolve phenotypic diversity. Evolution of isoform function is a potential route to adapt to new environments. Here we show that de novo, beneficial alleles in the nurf-1 gene became fixed in two laboratory lineages of C. elegans after isolation from the wild in 1951, before methods of cryopreservation were developed. nurf-1 encodes an ortholog of BPTF, a large (>300 kD) multidomain subunit of the NURF chromatin remodeling complex. Using CRISPR-Cas9 genome editing and transgenic rescue, we demonstrate that in C. elegans, nurf-1 has split into two, largely non-overlapping isoforms (NURF-1.D and NURF-1.B, which we call Yin and Yang, respectively) that share only two of 26 exons. Both isoforms are essential for normal gametogenesis but have opposite effects on male/female gamete differentiation. Reproduction in hermaphrodites, which involves production of both sperm and oocytes, requires a balance of these opposing Yin and Yang isoforms. Transgenic rescue and genetic position of the fixed mutations suggest that different isoforms are modified in each laboratory strain. In a related clade of Caenorhabditis nematodes, the shared exons have duplicated, resulting in the split of the Yin and Yang isoforms into separate genes, each containing approximately 200 amino acids of duplicated sequence that has undergone accelerated protein evolution following the duplication. Associated with this duplication event is the loss of two additional nurf-1 transcripts, including the long-form transcript and a newly identified, highly expressed transcript encoded by the duplicated exons. We propose these lost transcripts are non-functional side products necessary to transcribe the Yin and Yang transcripts in the same cells. Our work demonstrates how gene sharing, through the production of multiple isoforms, can precede the creation of new, independent genes.

https://doi.org/10.7554/eLife.48119.001

Introduction

There is general interest in understanding how animals adapt to new environments. What are the alleles that matter to positive selection and what sort of genes do they target? Since methods were developed to map and identify the genes harboring causative genetic variation, researchers have often isolated changes in the same gene in different populations or species (Wood et al., 2005; Martin and Orgogozo, 2013). Besides targeting specific genes, evolution can target classes of genes that share molecular features such as biochemical (e.g. chemoreceptor genes; Bachmanov and Beauchamp, 2007; Keller et al., 2007; Wisotsky et al., 2011; Lunde et al., 2012; McRae et al., 2012; McBride et al., 2014; Greene et al., 2016a; Greene et al., 2016b) or developmental function (e.g. master regulators of cell fate; Sucena et al., 2003; Colosimo et al., 2005; Chan et al., 2010; Yang et al., 2018). One molecular feature predicted to be important for evolution is the ability of genes to produce multiple protein isoforms. A single protein-coding gene can produce multiple isoforms using alternative transcription initiation and termination sites combined with alternative splicing between exons (Pan et al., 2008; Pal et al., 2011). Isoform-specific evolution is found throughout vertebrates, including recent evolution of transcript expression in primates (Barbosa-Morais et al., 2012; Merkin et al., 2012; Shabalina et al., 2014; Zhang et al., 2017). Whether the increase in transcriptomic diversity is important for adaptive evolution remains an important question, and only a few examples have shown how isoform evolution could be involved in phenotypic diversity (Mallarino et al., 2017).

The ability of a gene to produce multiple protein isoforms might also play a role in the genesis of new genes. Over long evolutionary timescales, gene duplication and diversification can create paralogous genes with different functions (Ohno, 1970; Innan and Kondrashov, 2010). One central mystery in this process is the order of these two events; do mutations that duplicate genes occur first or does functional diversification preclude the duplication event? One mechanism the latter route can happen through is by gene sharing, or the ability of a gene to create multiple protein products (or a single protein product) that have two or more distinct functions (Hughes, 1994). If each isoform acts in different tissues or plays distinct roles in biological processes, subsequent duplication mutations can result in the separation of these isoforms into two distinct genes.

As a model for understanding the genetic basis of adaptive evolution in an animal model, we use the small nematode Caenorhabditis elegans. Besides its genetic tractability, use of this organism allows the analysis of evolution at different timescales. For example, experimental evolution can be used to study evolutionary processes in controlled environments on the order of 10–1000 generations (Gray and Cutter, 2014; Teotónio et al., 2017; Penley et al., 2018; Chelo et al., 2019; Saxena et al., 2019; Wernick et al., 2019). For longer timescales, a growing number of isolated and sequenced Caenorhabditis species can be used to study genetic differences responsible for species-level differences (Ting et al., 2018; Yin et al., 2018; Bi et al., 2019; Stevens et al., 2019).

For understanding short-term adaptation, we study two laboratory strains of C. elegans, called N2 and LSJ2, which descended from a single hermaphrodite isolated in 1951 (Figure 1A). These two lineages split from genetically identical populations between 1957 and 1958 and evolved in two very different laboratory environments – N2 grew on agar plates seeded with E. coli bacteria and LSJ2 in liquid cultures containing liver and soy peptone extracts (McGrath et al., 2009; McGrath et al., 2011; Sterken et al., 2015). By the time permanent means of cryopreservation were developed, approximately 300–2000 generations had passed, and ~ 300 new mutations arose and fixed in one of the two lineages (McGrath et al., 2011). Despite their genetic similarity, substantial divergence has occurred between these strains in terms of phenotype and fitness, including a large number of developmental, behavioral, and reproductive traits. Use of these strains allow us to identify causal genetic variants responsible for phenotypic and fitness changes. To date, five de novo, causal genetic variants have been identified in either the N2 or LSJ2 lineage (de Bono and Bargmann, 1998; McGrath et al., 2009; Persson et al., 2009; McGrath et al., 2011; Duveau and Félix, 2012; Large et al., 2016; Large et al., 2017; Zhao et al., 2018).

Figure 1 with 2 supplements see all

Download asset Open asset

An N2-derived genetic variant in the intron of *nurf-1* increases fitness in laboratory conditions.

(A) History of two laboratory adapted *C. elegans* strains N2 and LSJ2, which descend from the same individual hermaphrodite isolated in 1951. The N2 and LSJ2 lineage split sometime around 1958. N2 grew on agar plates with *E. coli* OP50 as a food source for around 11 years until they were cryopreserved. LSJ2 animals were cultured in liquid axenic media containing sheep liver extract and soy extract peptone as a food source for about around 51 years until they were cryopreserved. 302 genetic variations were fixed between these two strains, including two that fall in the *nurf-1* gene – WBVar00601361 and WBVar00601565. (B) Genetic location of two *nurf-1* variations. WBVar00601361 (in red box) is an N2-derived intron single nucleotide substitution T/A (N2/ancestral) in the 2nd intron of *nurf-1*. WBVar00601565 is an LSJ2-derived 60 bp deletion in the 3’ end of *nurf-1* that removes the last 18 amino acids and part of the 3’-UTR. (C) Comparison of NURF-1 orthologs from *Drosophila* and humans showing position of protein domains and conserved regions as determined by Blastp and Clustal Omega. (D) Boxplot of pairwise evolutionary fitness differences between the indicated strains measured by directly competing the indicated strains against each other for five generations. PTM288 and PTM229 were created from the N2* and N2 strains, respectively, by a engineering DNA barcode in the *dpy-10* gene. PTM417 is the same genotype as ARL_{(del_LSJ2>N2)}, with the exception of a background mutation in the *spe-9* gene that occurred during the construction of the ARL_{(del_LSJ2>N2)} strain (for details see Methods). This mutation was crossed out of the PTM417 strain and used as a barcode for the digital PCR reaction. The genotype of each *nurf-1* allele (shown in B) is indicated by color. The NIL strain also contains LSJ2 alleles of additional linked mutations, which is indicated by the blue horizontal line. (E) Total brood size of the N2 and ARL_{(intron,LSJ2>N2)} strains. (F) Number of differentially expressed genes between synchronized N2 and ARL_{(intron,LSJ2>N2)} animals harvested 52 hr (L4 stage - when spermatogenesis is active) or 60 hr (young adults - when oogenesis is active) after hatching. For all figures, each dot represents an independent replicate, the box indicates the interquartile values of all data, and the line indicates the median of all data. Positive values indicate strain one is more fit than strain two. Negative values indicate strain two is more fit than strain one. For all figures, n.s. indicates p>0.05, one star indicates significant difference at p<0.05 level, two stars indicate significant difference at p<0.01 level, and three stars indicate significant difference at p<0.001 level.

https://doi.org/10.7554/eLife.48119.002

Figure 1—source data 1 Source data for Figure 1.: https://doi.org/10.7554/eLife.48119.006
Download elife-48119-fig1-data1-v2.xlsx

One of these mutations is an LSJ2-derived, 60 bp deletion at the 3’ end the nurf-1 gene that reduces growth rate, slows reproductive output, and prevents development into the dauer diapause state in response to ascaroside pheromones (Figure 1B) (Large et al., 2016). This genetic variant is beneficial in the LSJ2 liquid cultures in which it arose and fixed, but places animals at a disadvantage in the agar plate environments in which N2 evolved, an example of gene-environment interaction (Large et al., 2016). We proposed that nurf-1 is a regulator of life-history tradeoffs. Life history tradeoffs represent competing biological traits requiring large energetic investments, such as the tradeoff between energy required for reproduction versus the energy required for individual survival. The difference in fitness of this allele in the two laboratory environments is potentially determined by how the life-history tradeoffs map into reproductive success.

Studies of nurf-1 and its orthologs provide fundamental support for its role as a life history regulator. nurf-1 encodes an ortholog of mammalian BPTF, a subunit of the NURF chromatin remodeling complex (Barak et al., 2003) (Figure 1C). BPTF encodes a large protein containing a number of domains that facilitate recruitment of NURF to specific regions of the genome for chromatin remodeling (Alkhatib and Landry, 2011), including domains that interact with sequence-specific transcription factors and three PHDs and a bromodomain that facilitate interactions with modified nucleosomes (Li et al., 2006; Wysocka et al., 2006; Kwon et al., 2009; Ruthenburg et al., 2011). Through its DDT domain (Fyodorov and Kadonaga, 2002), BPTF cooperates with ISWI to slide nucleosomes along DNA, changing access of promoter regions to transcription factors that drive gene transcription. In mammals, BPTF regulates cellular differentiation and homeostasis of specific cell-types and tissues, including the distal visceral endoderm (Landry et al., 2008), ecoplacental cone (Goller et al., 2008), hematopoietic stem/progenitor cells (Xu et al., 2018), mammary stem cells (Frey et al., 2017), T-cells (Wu et al., 2016), and melanocytes (Koludrovic et al., 2015). In Drosophila, the ortholog to BPTF, NURF301, regulates the heat shock response, pupation, spermatogenesis, and innate immunity (Badenhorst et al., 2002; Badenhorst et al., 2005; Kwon et al., 2008; Kwon et al., 2009). Many of these traits can be viewed as life-history tradeoffs, e.g. large energetic investments in individual survival through the development of the immune system vs. energetic transfers to offspring in the placenta or mammary glands. The evolution of BPTF/NURF-1 function might also be relevant in human disease. Genetic alterations in BPTF have been reported in tumors, including gene amplification and point mutations (Buganim et al., 2010; Balbás-Martínez et al., 2013). In addition, BPTF has been shown to be required for the transcriptional activity of c-MYC, a major human oncogene (Richart et al., 2016).

In this paper, we continue our studies of the evolution of the N2/LSJ2 laboratory strains. We demonstrate that an independent, beneficial mutation in the nurf-1 gene was fixed in the N2 lineage, suggesting that nurf-1 is a preferred genetic target for laboratory adaptation. To understand why nurf-1 might be targeted, we explored the in vivo role in C. elegans development by taking advantage of CRISPR-Cas9 to test causal relationships that inform laboratory evolution and fitness effects. Our work suggests that the large, full-length isoform of nurf-1, primarily studied in mammals, is dispensable for development. Instead, two, largely non-overlapping isoforms are both essential for reproduction, having opposing effects on cellular differentiation of gametes into sperm or oocytes. Our results suggest that the ability of nurf-1 to regulate life history tradeoffs is the result of exquisite regulation of NURF function through the balance of two competing isoforms, reminiscent of the principle of Yin and Yang. Finally, we demonstrate that these two isoforms have split into separate genes in a clade of related nematodes, potentially resolving transcriptional and functional conflict between the Yin and Yang isoforms transcription and function. Our work demonstrates how evolution of isoforms can precede the origin of a new gene, supporting a role for gene sharing in the origin of functionally novel proteins.

An N2-derived variant in the second intron of nurf-1 increases fitness and brood size in laboratory conditions

We previously mapped differences in a number of traits (including reproductive rate, fecundity, toxin and anthelmintic sensitivity, and laboratory fitness) between N2 and LSJ2 to a QTL centered over nurf-1, which contains a derived mutation in both the N2 and LSJ2 lineages (Figure 1A and B) (Large et al., 2016; Large et al., 2017; Zhao et al., 2019). The LSJ2 allele of nurf-1 contains a 60 bp deletion in the 3’ end of the coding region of the gene, overlapping the stop codon and probably resulting in the translation of parts of the 3’ UTR. The N2 allele of nurf-1 contains an SNV that converts an A to a T in a homopolymer run of Ts in the 2^nd intron (Figure 1B). Using CRISPR-Cas9-based genome editing, we previously demonstrated that the LSJ2-derived deletion accounted for a large portion of the trait variance in reproductive rate explained by the QTL. However, it did not explain the entire effect of this locus (Large et al., 2016). We decided to test whether this additional genetic variant or variants affected fitness of the animals in laboratory conditions using a previously described pairwise competition assay (Zhao et al., 2018). To do so, we took advantage of three strains we had previously created; CX12311 is a near isogenic line with ancestral (non-N2) alleles of npr-1 and glb-5 crossed into an otherwise N2 genetic background, which we have used to eliminate the fitness and phenotypic effect of derived (N2) alleles of npr-1 and glb-5 (Zhao et al., 2018) (Figure 1D - referred to as N2*), NIL_{(nurf-1,LSJ2>N2*)} is a near isogenic line containing LSJ2 alleles of both nurf-1 mutations backcrossed into an N2* background (Large et al., 2016), and ARL_(del,_LSJ2>N2*) is an allelic replacement line containing the LSJ2-derived 60 bp deletion edited into the N2* strain using CRISPR-Cas9. Phenotypic differences between the NIL_{(nurf-1,LSJ2>N2*)} and ARL_(del,_LSJ2>N2*) strains are caused by the N2-derived intron SNV in nurf-1, or one of the additional seven linked LSJ2-N2 genetic variants near nurf-1.

We measured the relative fitness of the N2*, NIL_{(nurf-1,LSJ2>N2*)}, and ARL_(del,_LSJ2>N2*) strains against PTM288, a version of N2* that also contains a silent mutation in the dpy-10 gene (Figure 1D). The dpy-10 silent mutation provides a common genetic variant that can be used to quantify the relative proportion of each strain on a plate using digital droplet PCR. Both the NIL_{(nurf-1,LSJ2>N2*)} and ARL_(del,_LSJ2>N2*) strains showed dramatically reduced fitness comparing to PTM288, consistent with our previous report showing that the 60 bp deletion is deleterious on agar plates (Large et al., 2016). However, the NIL_{(nurf-1,LSJ2>N2*)} was quantitatively and significantly less fit than the ARL_(del,_LSJ2>N2*) strain, suggesting additional genetic variant(s) in the NIL_{(nurf-1,LSJ2>N2*)} strain further reduced its fitness. To confirm this result, we also directly competed the NIL_{(nurf-1,LSJ2>N2*)} and ARL_(del,_LSJ2>N2*) strains against each other, using a nearly neutral background mutation in spe-9(kah132) to distinguish the two strains (Figure 1D).

To determine if the N2-derived intron SNV in nurf-1 (Figure 1B) was responsible for the fitness gains (as opposed to one of the seven linked LSJ2/N2 variants), we used CRISPR-Cas9 to directly edit the LSJ2 allele of the intron SNV into the standard N2 strain to create a strain we will refer to as ARL_(intron,_LSJ2>N2). We measured the relative fitness of the ARL_(intron,_LSJ2>N2) and N2 strains against PTM229 (a strain which again contains a dpy-10 silent mutation). The ARL_(intron,_LSJ2>N2*) strain was significantly less fit than the N2 strain at a level similar to the difference between the NIL_{(nurf-1,LSJ2>N2*)} and ARL_(del,_LSJ2>N2) strains (Figure 1D). These results indicate that beneficial alleles of nurf-1 arose in both laboratory lineages - the 60 bp deletion makes LSJ2 animals more fit in liquid, axenic media (Large et al., 2016), and the intron SNV makes N2 animals more fit on agar plates seeded with bacteria.

In C. elegans, brood-size of hermaphrodites is negatively correlated to the timing of initial egg-laying. It has been suggested that this life-history tradeoff has been optimized in N2 animals (Hodgkin and Barnes, 1991; Cutter, 2004). After sexual maturation, gonads in the hermaphroditic sex initially undergo spermatogenesis before transitioning to oogenesis; a concomitant lengthening of spermatogenesis time increases the total brood size of hermaphrodites but also delays when reproduction can start. When we compared the total fecundity produced by the N2 and ARL_{(intron,LSJ2>N2)} strains, we found a significant difference, with the ARL_{(intron,LSJ2>N2)} strain producing ~ 30 fewer offspring than N2 (Figure 1E). This could indicate that spermatogenesis occurs for a longer period of time in N2 animals (to produce more sperm), or, alternatively, indicate that sperm are less likely to fertilize an egg in the ARL_{(intron,LSJ2>N2)} strain. The reproductive rate of the N2 and ARL_{(intron,LSJ2>N2)} strains was largely unchanged throughout their reproductive lifespan (Figure 1—figure supplement 1).

RNA-seq analysis identified transcriptional differences caused by the intron SNV during spermatogenesis, supporting our hypothesis that sperm development is affected by this SNV. We collected RNA from synchronized N2 and ARL_{(intron,LSJ2>N2)} hermaphrodites at two timepoints, 52 and 60 hr after hatching, which occur during spermatogenesis (52 hr) or oogenesis (60 hr). Interestingly, a large number of genes are differentially expressed between the two strains but only during the 52 hr timepoint (3384 genes vs. 25 genes) (Figure 1F, Figure 1—figure supplement 2A, and Supplementary file 1). Inspection of these differentially-expressed genes in a single-cell RNAseq dataset (Cao et al., 2017) demonstrated that although a portion of these 3384 genes are expressed in the germline, these genes are also expressed in additional tissues (Figure 1—figure supplement 2B). Gene ontology analysis suggests that cuticle development and innate immune responses are regulated by nurf-1 (Supplementary file 2) consistent with the role of its orthologs in regulating immunity and melanocyte proliferation in Drosophila and humans (Kwon et al., 2008; Landry et al., 2011; Koludrovic et al., 2015; Wu et al., 2016). The restriction of most of these transcriptional changes to a specific timepoint (i.e. 52 hr) could reflect a specific-role for nurf-1 in regulating genes undergoing short bursts of transcriptional upregulation during this developmental timepoint (e.g. molting), a specific-role for nurf-1 in regulating cell number or activity of specific cell-types that are transiently present during this timepoint (e.g. spermatocytes), or some combination of both. These results suggest that the intron SNV regulates a number of developmental processes including spermatogenesis, molting, and innate immunity.

nurf-1 produces multiple transcripts encoding multiple protein isoforms

Our results suggest that selection acted repeatedly on C. elegans nurf-1 during laboratory growth. The molecular nature of NURF-1, an essential subunit of the NURF chromatin remodeling complex, is surprising for a hotspot gene. In general, chromatin remodelers are thought of as ubiquitously expressed regulators with little variation in different cell types, akin to general function RNA polymerase proteins or ribosomes. Why would genetic perturbation of nurf-1 lead to increased fitness? One potential clue is the complexity of the nurf-1 locus. Previous cDNA analysis of nurf-1 identified four unique transcripts encoding four unique isoforms (Andersen et al., 2006), two of which have been shown to affect different phenotypes (summarized in Table 1).

Table 1

Summary of major nurf-1 transcripts identified in C. elegans.

https://doi.org/10.7554/eLife.48119.007

Name	Evidence		Size		Conserved^c	Predicted biological role in C. elegans^d	Other names
Name	Transcript^a	Protein^b	aa	kD	Conserved^c	Predicted biological role in C. elegans^d	Other names
nurf-1.a	N	-	2197	252	M,D	None	Full-length
nurf-1.b	C,N,I	W	1621	186	D	Reproduction, vulval development	N-terminal or NURF-1.A
nurf-1.d	C,N,I	W	816	92	-	Size, dauer, reproduction, axon guidance	C-terminal or NURF-1.C
nurf-1.f	C,N,I	W	581	58	-	None	NURF-1.E
nurf-1.q	N,I	-	243	36	-	None	-

^a C indicates full-length cDNA have been isolated for this transcript, N indicates evidence from direct sequencing of RNA or cDNA using Oxford Nanopore reads support this transcript, and I indicates evidence from Illumina short read RNA-seq supports this transcript

^b W indicates evidence for the protein isoform was obtained using western blot
^c M or D indicates an analogous isoform is described in mammals (mice or humans) or Drosophila, respectively

^dPredictions from Andersen et al. (2006), Large et al. (2016), or Mariani et al. (2016)

To identify other transcripts produced by nurf-1 and quantify the relative proportions of each that are produced, we analyzed previously published Illumina short-read (Brunquell et al., 2016) (isolated from synchronized L2 larval animals) and Oxford Nanopore long-read RNA sequencing reads (Roach et al., 2019) (isolated from mixed populations) (Figure 2—figure supplements 1–2). Our results support many of the conclusions of Andersen et al. (2006) but contain a few surprises. We identified five major transcripts (Figure 2A) - three previously isolated (nurf-1.b, nurf-1.d, and nurf-1.f) but also two newly identified (nurf-1.a and nurf-1.q) (mapping of transcript names used in Andersen et al. are listed in Table 1). nurf-1.a encodes a full-length 2197 amino acid isoform analogous to the primary isoform of BPTF in humans and NURF301 in Drosophila (Figure 1C). Despite the expectation that C. elegans would produce a similar protein, the Oxford Nanopore long-read data are the only evidence supporting its existence. The nurf-1.q transcript is predicted to produce a 243 amino acid unstructured protein. With the exception of the full-length nurf-1.a transcript, the overlap of these transcripts is quite minimal, resulting in predicted isoforms with unique protein domains and functions (Figure 2B).

Figure 2 with 4 supplements see all

Download asset Open asset

*nurf-1* encodes five major transcripts.

(A) Genomic position of the five *nurf-1* transcripts supported by Illumina short read and Oxford Nanopore long reads. Each blue box is an exon. Exon number is indicated on the figure. Dark blue exons (10, 16, and 21) are alternatively spliced, resulting in a 6–9 bp difference in length (see Figure 2—figure supplement 1 for details). Genomic location of the HA and FLAG epitope tag insertion site are shown in black along with their associated allele names. (B) The predicted protein isoforms produced by each of the five major transcripts and along with the domains each isoform contains. Immunoblots only supported translation of the B, D, and F isoforms (see panel D for details). For reference, the spliced *nurf-1.a* transcript is also shown. (C) Relative expression levels of each transcript, determined by number of Oxford Nanopore reads from a mixed population (top panel) or analysis of Illumina short reads from L2 staged animals using kallisto (bottom reads). tpm = transcripts per million. (D) Western blots of N2 and PTM420 strains. PTM420 contains the HA and FLAG epitope tags shown in panel A. Anti-HA antibody detected a band matching the expected size of the NURF-1.B isoform (arrow). Anti-FLAG antibody detected bands matching the expected size of the NURF-1.D and NURF-1.F isoforms (arrows).

https://doi.org/10.7554/eLife.48119.008

We quantified the relative expression of these five transcripts by either counting the number of Nanopore reads that matched the transcript or by using kallisto (Bray et al., 2016) to predict transcript abundance using Illumina short-read sequencing data (Figure 2C). These predictions qualitatively agreed in transcript ranking of expression strength (although quantitative variation in predictions were observed, reflective of the different technologies or developmental stages of the animals). Surprisingly, the newly described nurf-1.q transcript was the most highly expressed followed by the nurf-1.b transcript, and the nurf-1.a, nurf-1.d and nurf-1.f were expressed at similar lower levels.

Although each of the five major transcripts are transcribed, this result does not necessarily mean they are translated into stable protein products. To facilitate analysis of NURF-1 proteins, we used CRISPR-Cas9 to fuse two distinct epitope tags (HA and 3xFLAG tag) to the endogenous nurf-1 locus, just prior to the stop codons in the 16^th and 28^th exon, respectively (Figure 2A). Immunoblot analysis supported the expression of the B, D, and F isoforms, but not the A or Q isoforms (Figure 2D). Although larger proteins, such as the A isoform, can be difficult to transfer during immunoblots, the lack of a band matching the small Q isoform suggests the highly expressed nurf-1.q transcript is not translated into protein or the protein is rapidly degraded.

Based upon these results, we speculated that the intronic SNV, which we have shown regulates total fecundity and fitness in laboratory conditions (Figure 1), could specifically alter the expression level of the nurf-1.b transcript. However, analysis of all five nurf-1 transcript levels, using the previously described RNA-seq data on the N2 and ARL_{(intron,LSJ2>N2)} strains, did not reveal any significant expression differences (Figure 2—figure supplement 3). Potentially the effect of this SNV has cell-type specific effects in spermatocytes, however, our data, using RNA collected from the whole animal, does not allow us to test this hypothesis.

We also investigated whether similar isoforms could be expressed in human cells, using western blots on a small panel of human cancer cell lines. Interestingly, besides a band matching the expected size of the canonical full-length isoform, a number of additional bands were observed between 150–250 kD (Figure 2—figure supplement 4A). Using mass-spectrometry, we confirmed the presence of multiple BPTF peptides in the bands detected by western blotting (Figure 2—figure supplement 4), consistent with one or more of these bands representing novel BPTF isoforms. Potentially, these isoforms could play a role in cancer metastasis, although we provide no such evidence here. Despite the presence of these additional bands, the full-length version of BPTF is the most highly expressed isoform (Figure 2—figure supplement 4D), consistent with its importance in mammalian species.

The B and D isoforms are both essential for reproduction and the F isoform modifies the heat shock response

Genetic analysis of nurf-1 primarily relied on two deletion alleles, n4293 and n4295 (Figure 3A) (Andersen et al., 2006). The n4293 allele deletes the first exon and predicted transcriptional start site of the nurf-1.a and nurf-1.b transcripts. The n4295 allele deletes three exons of the nurf-1.a, nurf-1.d, and nurf.1.f transcripts that encode a C-terminal PHD domain (Figure 3—figure supplement 1) necessary for human BPTF function. Comparison of the phenotypes of the n4293 and n4295 homozygotes leads to the model that the B isoform is essential for reproduction and the A, D, and/or F isoforms have subtle effects on growth rate and reproductive rate (Table 1).

Figure 3 with 3 supplements see all

Download asset Open asset

An additional isoform besides NURF-1.

B is necessary for reproduction in *C. elegans*. (A) Genomic positions of *nurf-1* classical deletion alleles and nine engineered stop codons created using CRISPR/Cas9 based gene editing. Each allele is color-coded by the reproductive ability of homozygous strains. Green is statistically indistinguishable from wild-type, yellow indicates slightly reduced brood size and change in reproductive rate, red indicates substantially reduced brood size in the first generation and eventual sterility after multiple generations of homozygosity, and dark red indicates sterility in the first generation of homozygosity. (B) Fecundity of indicated strains (shown in x-axis of panel) (C) Normalized body area of the indicated strains. Normalized body area was calculated by thresholding video recordings of each strain to segment individual animals and registered throughout each frame of the video. Each dot represents the average area of a single worm, normalized to the N2 data. For red or dark red strains (panel A), measurements were taken on animals homozygous for a single generation. (D) Predicted amino acid change of engineered stop codons and classical alleles on the NURF-1 isoforms. The *kah11* mutation only affects the F isoform.

https://doi.org/10.7554/eLife.48119.015

Figure 3—source data 1 Source data for Figure 3.: https://doi.org/10.7554/eLife.48119.019
Download elife-48119-fig3-data1-v2.xlsx

To further delineate the biological role of each isoform, we used CRISPR-Cas9 to engineer nine stop codons in eight exons of the nurf-1 gene: the first, second (two positions), 7^th, 15^th, 18^th, 19^th, 23^rd, or 26^th exons (Figure 3A). The predicted effects of these stop codons on each major isoform are shown in Figure 2—figure supplement 4 and Table 2. Homozygous animals for each mutation were assayed for total brood size and growth rate. Analysis of the phenotypes of these mutants indicated that our working model was incorrect. Instead, we propose that both the B and D isoforms are essential for reproduction.

Table 2

Predicted effect of stop codon mutations on NURF-1 isoforms.

https://doi.org/10.7554/eLife.48119.020

Isoform	kah90	kah91	kah106	kah142	kah93	kah96	kah99	kah11	kah68	Length
NURF-1.A	5R	107P	147E	646A	1548G	1632T	1685Q, 1689P, 1693N		2056T	2197
NURF-1.B	5R	107P	147E	646A	1548G	-	-		-	1621
NURF-1.D	-	-	-		170G	254T	307Q, 311P, 315N		675T	816
NURF-1.F	-	-	-		-	-	-	3L	440T	581
NURF-1.Q	-	-	-		170G	-	-		-	243

As expected, engineering stop codons in the first, second, and 7^th exons greatly reduced fecundity, resulting in either sterility, or a mortal germline phenotype, initially reducing total brood size of animals, before eventually causing complete sterility after around three-to-five generations of homozygosity (Figure 3B and C). Although the qualitative phenotypes of these four alleles agreed, we observed interesting quantitative differences between them. The second stop codon in the second exon (kah106) and the stop codon in the 7^th exon (kah142) reduced growth and fecundity more than the first exon stop codon (kah90) or the first stop codon in second exon (kah91) (Figure 3B and C). We suspect this result indicates the presence of an internal ribosome entry site in the middle of the second exon at the 122^nd Methionine, causing the expression of two isoforms from a single transcript. The reduced severity of the first two stop codon alleles can be explained by their inability to affect the protein sequence of the second isoform. An alternative possibility is a difference in frequency of translational read-through of each stop codon, which are interpreted as sense codons at a low frequency (Jungreis et al., 2011).

Unexpectedly, engineering stop codons in the 18^th and 19^th exons also caused a mortal germline phenotype (kah96 and kah99) (Figure 3B). This result was surprising, because the n4295 allele, predicted to be a loss-of-function allele for the D and F isoforms due to the loss of the PHD and bromodomains, does not have a mortal germline phenotype. We excluded a number of potential explanations for this discrepancy. A suppressor for the n4295 allele could have fixed during the construction of this strain. However, the kah68 allele, which contains a stop codon within the n4295 deleted region, phenocopies the n4295 allele and not the kah96 and kah99 animals (Figure 3B and C, and Figure 3—figure supplement 2)). Another possibility is that the D isoform suppresses the F isoform; loss of both isoforms (in the n4295 background) is tolerated, but loss of just the D isoform (in the kah96 or kah99 backgrounds) allows the F isoform to prevent reproduction. However, we could exclude this possibility as the double mutant containing both the n4295 allele and the 18^th exon stop allele phenocopied the kah96 single mutant (Figure 3—figure supplement 3). Additionally, specific loss of the F isoform by the 23^rd exon stop allele (kah11) did not affect the phenotype of animals (Figure 3B and C). Our data suggest that, unlike human BPTF, the ability of NURF-1 to bind modified histones is not required for its function. We further confirmed this hypothesis by editing conserved residues in these the PHD and bromodomains necessary for recognition of the H3K4me3 and H4K16ac marks (Figure 3—figure supplement 1).

The most parsimonious explanation of our data is that either the A or D isoform is essential for reproduction in C. elegans. Compound heterozygote tests allowed us to distinguish between these possibilities, indicating that the D isoform is required for reproduction and wild-type growth rate, and the A isoform is dispensable for reproduction and development (Figure 4). We first verified that the kah93, kah96, and kah106 alleles were recessive by measuring the fecundity of heterozygous animals (Figure 4B). Next, we examined the fecundity of kah106/kah96 compound heterozygotes, which are predicted to lack only the A isoform, due to the production a single unaffected copy of the B isoform from the kah96 haplotype and a single unaffected copy of the D isoform from the kah106 haplotype. If the A isoform was essential for reproduction, we would expect these compound heterozygotes to be sterile or have severe defects in fecundity. However, these animals were indistinguishable from wild-type, suggesting that the full-length A isoform is not essential (Figure 4B). The kah106/kah93 compound heterozygotes showed similar results. These animals are predicted to encode one unaffected copy of the D isoform, one truncated copy of the B isoform, and zero unaffected copies of the A isoform. These animals were mostly wild-type, with a small reduction in total fecundity (Figure 4B). We interpret this to mean that the A isoform is not essential and the truncation of the B isoform slightly perturbs its function, causing a slight reduction in fecundity. Finally, we analyzed kah93/kah96 compound heterozygotes. These animals are predicted to encode zero wild-type copies of the D isoform, one wild-type copy of the B isoform, and zero wild-type copies of the A isoform. These animals were essentially sterile. Taken together, we conclude that the B and the D isoform are both essential for reproduction.

Figure 4 with 2 supplements see all

Download asset Open asset

Genetic analysis suggests the NURF-1.

B and NURF-1.D isoforms are essential for reproduction in *C. elegans*. (A) Genomic positions of stop codon or classical deletion mutations used for compound heterozygote or transgenic rescue analysis of B and C. *kah3* is a CRISPR/Cas9 genomic edit of the LSJ2-derived 60 bp deletion. (B) Fecundity of homozygote (red), heterozygote (green), and compound heterozygote mutants (yellow) as indicated in the x-axis. The table below the x-axis is the predicted effect of each mutant strain on the indicated *nurf-1* isoforms. The number in the table indicates the number of functional copies. The star indicates the milder predicted effect of *kah93* on NURF-1.B, as it only truncates 73 of 1621 amino acids. The y-axis shows the fecundity for each strain. (C) Fecundity of indicated strains with and without the presence of an integrated *nurf-1.d* transgene. The genetic background is also indicated. N2* contains ancestral introgressions of the *npr-1* and *glb-5* genes.

https://doi.org/10.7554/eLife.48119.021

Figure 4—source data 1 Source data for Figure 4.: https://doi.org/10.7554/eLife.48119.024
Download elife-48119-fig4-data1-v2.xlsx

To confirm that the D isoform is essential, we also created a transgenic strain containing an integrated construct driving a nurf-1.d cDNA from its endogenous promoter. This transgene could fully rescue the fecundity phenotype of the kah96 allele and partially rescue the fecundity phenotype of the kah93 allele (Figure 4C). This transgene could also rescue the reproductive timing and fecundity changes of the n4295 allele and the LSJ2-derived 60 bp deletion (kah3) (Figure 4C and Figure 4—figure supplement 1). As expected, this transgene could not rescue the kah106 allele, which creates a stop codon in the B isoform. These data further support a requirement of both the B and D isoforms for reproduction.

Although the F isoform does not seem to have an effect in normally developing animals, it is involved in the heat shock response. Multiple reports have demonstrated that nurf-1 is upregulated in response to heat shock (Brunquell et al., 2016; Li et al., 2016). By analyzing RNA-seq reads from these two papers, we found that the nurf-1.f transcript was specifically upregulated in both datasets, with increased coverage of the 23^rd exon as well as the 24^th through 28^th exons (Figure 4—figure supplement 2A and B). We confirmed that the increased transcription of the nurf-1.f transcript also increased NURF-1.F protein abundance (Figure 4—figure supplement 2C and D). Transcriptional analysis of strains lacking the F isoform indicated that the initial transcriptional response to heat shock was largely the same, but the long-term transcriptional response of a subset of genes was affected (Figure 4—figure supplement 2E–G). We conclude that the F isoform is specifically up-regulated by heat shock and plays a modulatory role in determining the long-term transcriptional response to heat shock.

The B and D isoforms have opposite effects on cell fate during gametogenesis

Although the B and D isoforms are both required for reproduction, the molecular mechanism that these isoforms operate through could be different. One possibility is that the long-form of NURF-1 has split into two subunits - both isoforms participate as part of the NURF complex, cooperating together to regulate reproduction. However, the D isoform might instead modify NURF activity by competing for binding with transcription factors or regions of the genome to which NURF is recruited. A third possibility is that the D isoform acts through a NURF-independent pathway.

To gain insights into the molecular nature of the D isoform, we decided to determine precisely how the B and D isoforms regulate reproduction, using three nurf-1 stop alleles (Figure 5A). For hermaphrodites to produce a fertilized egg, the gonads must produce both male and female gametes at different developmental times (Figure 5B). Initially, gametogenesis produces sperm, creating approximately 300 sperm at which point a permanent sperm-to-oocyte switch occurs. From this time, gametogenesis produces oocytes until the animal dies or the gonad ceases to function (Hubbard and Greenstein, 2005). A number of defects could cause sterility – inability to form gametes, inability to create sperm, inability to create oocytes, or defects in the sperm and/or oocyte function. We used DAPI staining to characterize the production of sperm and oocytes in three nurf-1 mutants (Figure 5C and D). We first tested kah106 mutants, which lack the B isoform (Figure 5A), for the ability to produce sperm. Compared with N2 animals, which create ~ 300 sperm per animal, the number of sperm produced by kah106 animals was greatly reduced, resulting in the production of only approximately 60 sperm (Figure 5D). These animals produced a normal number of oocytes, indicating that spermatogenesis seemed to be affected specifically (Figure 5E). We interpret these data as evidence that hermaphrodites that lack the NURF-1.B isoform spend less time in spermatogenesis before transitioning to oogenesis. We next tested kah96 mutants which lack the D isoform. These animals produced approximately 500 sperm and almost no oocytes (Figure 5C-E). We interpret these data as evidence that hermaphrodites that lack the D isoform are unable to transition from spermatogenesis to oogenesis. Finally, we performed similar experiments on kah93 mutants, which lack the D isoform and have a truncated B isoform. These animals showed an intermediate phenotype, with normal number of sperm but reduced number of oocytes (Figure 5D and E). The reduced activity of the B isoform due to its truncation potentially allows other factors to transition the animals to oogenesis, resulting in the milder defects found in the kah93 animals (Figure 3B).

Figure 5 with 1 supplement see all

Download asset Open asset

NURF-1.B and NURF-1.D have opposite effects on the sperm-to-oocyte switch in hermaphrodites.

(A) Genomic position of the previously-described stop codon mutants used in B and C. (B) Summary of gametogenesis of *C. elegans*. Animals undergo spermatogenesis during the late L4 and then transition to oogenesis stage during maturation to adulthood. The number of sperm produced during spermatogenesis can be determined by counting sperm in the spermatheca when oogenesis has begun. (C) Representative fluorescence images of one spermatheca for DAPI stained young adult animals. Each tiny dot represents the condensed chromosomes of a single sperm. (D) Sperm number of indicated strains. L4 animals were synchronized and allowed to develop for an additional 12 hr. DAPI staining was used to identify and count the number of sperm in each animal. Each dot represents a single animal. (E) Oocyte number of indicated strains. L4 animals were synchronized and allowed to develop for an additional 12 hr. DAPI staining was used to identify and count the number of oocytes in each animal.

https://doi.org/10.7554/eLife.48119.025

Figure 5—source data 1 Source data for Figure 5.: https://doi.org/10.7554/eLife.48119.027
Download elife-48119-fig5-data1-v2.xlsx

Although animals that lack either the B or D isoform are unable to reproduce, the cause of sterility is different at the cellular level. To further study the molecular effects of perturbing nurf-1 function, we transcriptionally profiled adult N2*, NIL_{(nurf-1,LSJ2>N2*)}, ARL_{(del, LSJ2, N2*)}, and LSJ2 animals, which contain various combinations of the N2 and LSJ2-derived nurf-1 mutations (Supplementary file 1). A multi-dimensional scaling plot indicated that the N2* and ARL_del replicates formed two unique clusters, and the LSJ2 and NIL_nurf-1 replicates largely overlapped in a third cluster (Figure 5—figure supplement 1A). The genetic variation surrounding the nurf-1 locus is responsible for the majority of transcriptional differences between adult LSJ2 and N2* animals, suggesting most of the fixed variants do not have a dramatic effect on transcription on N2-like growth conditions. Although the LSJ2-derived 60 bp deletion regulates transcription, additional genetic variation in the NIL_nurf-1 strain, presumably from the N2-derived intron variant, also regulates transcription in adult animals.

To study the effects of the 60 bp deletion and intron SNV on transcription, we focused on two comparisons: 1) the N2* vs ARL_{(del, LSJ2>N2*)}, which will identify transcriptional changes caused by the 60 bp deletion and 2) the NIL_{(nurf-1, LSJ2>N2*)} vs ARL_{(del, LSJ2>N2*)}, which will identify transcriptional changes caused by the intron SNV (as well as linked mutations in the NIL other than the 60 bp deletion). We expect that the latter comparison will mostly report the changes of the intron SNV, as it accounts for most of the fitness differences between the two strains. We observed a positive correlation between these two comparisons (Figure 5—figure supplement 1B). The most parsimonious explanation for this observation is that both the N2 and LSJ2-derived alleles in nurf-1 regulate the activity of a common molecular target, which is likely to be the NURF complex.

A duplication in a sister clade of Caenorhabditis species creates two separate nurf-1 genes

Previous work in C. briggsae characterized the role of nurf-1 in reproduction, including the isolation of nurf-1 cDNAs in this species (Chen et al., 2014). Interestingly, although transcripts matching the nurf-1.b and nurf-1.d were isolated from this species, they no longer shared any exons with each other, suggesting that they were expressed from two separate genes (Figure 6A). Further, spliced leader sequences to the 5’ end of both transcripts matched sl1 sequence, suggesting that these two genes were not expressed as a single operon (Blumenthal, 2012). We compared the gene products using BLAST and found that the shared exons in C. elegans had duplicated in C. briggsae, with one set of each retained in each of the new genes (Figure 6A). Short-read transcriptomics data for this species matched the cDNA analysis; we found evidence for transcripts matching nurf-1.b, nurf-1.d, and nurf-1.f (Figure 6—figure supplements 1–3). Unlike C. elegans, C. briggsae seemed to have lost both the nurf-1.a and nurf-1.q transcripts.

Figure 6 with 6 supplements see all

Download asset Open asset

A duplication of the shared exons of the *nurf-1.b* and *nurf-1.d* transcripts resulted in the split of *nurf-1* it into two separate genes in a subclade of *Caenorhabditis* species.

(A) Comparison of two species with different versions of *nurf-1*. In *C. elegans*, *nurf-1.b* and *nurf-1.d* overlaps in the 14th and 15th exon (shown in orange). In *C.briggsae*, a duplication of the orange exons resulted in separation of *nurf-1.b* and *nurf-1.d* into separate genes. *C. briggsae* has also lost expression of the *nurf-1.a* and *nurf-1.q* transcripts. (B) Distribution of the two versions of *nurf-1* (shown in panel A) in 32 Caenorhabditis species. Red indicates the *C. elegans* version, blue indicates the *C. briggsae* version, and black indicates a *nurf-1* version that could not be determined. The species phylogeny suggests that a duplication event occurred in the common ancestor of the *brenneri/tribulationis* clade. (C) The most well supported topology of the duplicated region is consistent with a single duplication event. Orange indicates protein sequence from the duplicated region in the *nurf-1–1* gene, and turquoise indicates protein sequence from the duplicated region in the *nurf-1–2* gene. Non-colored branches indicate unduplicated *nurf-1* sequence. The rate of amino acid substitution in the *nurf-1–1* duplicated region has also increased, as seen in the branch lengths. Scale is in substitutions per site.

https://doi.org/10.7554/eLife.48119.028

Figure 6—source data 1 Source data for Figure 6.: https://doi.org/10.7554/eLife.48119.035
Download elife-48119-fig6-data1-v2.xlsx

Analysis of the nurf-1 gene structure within the context of the Caenorhabditis phylogeny suggested that the exon duplication and separation of nurf-1 into separate genes occurred at the base of a clade containing 11 described species, including C. brenneri and C. tribulationis (Figure 6B). We determined the nurf-1 gene structure in 22 of the 32 Caenorhabditis species with published genomes and transcriptomes (Kiontke et al., 2011; Stevens et al., 2019) (Figure 6—figure supplements 1–3). Like C. briggsae, the species in the brenneri/tribulationis clade express a transcript matching nurf-1.b from a single gene (which we call nurf-1–1). These species also express two transcripts matching nurf-1.d and nurf-1.f from a second gene, called nurf-1–2. Analysis of the spliced leader sequence of the 5’ end of the nurf-1.d transcript only identified sl1 sequence, consistent with separation of these genes into distinct transcriptional units. None of these species appears to express nurf-1.a or nurf-1.q transcripts (Figure 6—figure supplements 1–3). RNA-seq data for species outside of this clade (Figure 6—figure supplements 1–3) matched the transcription pattern of C. elegans, suggesting that these species express five major transcripts from a single nurf-1 gene: nurf-1.a, nurf-1.b, nurf-1.d, nurf-1.f, and nurf-1.q. These data suggest that the C. elegans transcript structure is ancestral.

Phylogenetic analysis of the duplicated ~ 200 amino acid sequence was used to evaluate different hypotheses surrounding the timing and number of duplication events. The analysis supported the model that the split of nurf-1 into two distinct genes happened once within the common ancestor of the brenneri/tribulationis clade (Figure 6C – additional possible trees shown in Figure 6—figure supplement 4). The topology recovered for the region of nurf-1 outside the duplication is congruent with the species tree (Figure 6—figure supplement 5). Interestingly, the rate of amino acid substitution in the duplicated region was accelerated in nurf-1–1 relative to nurf-1–2 (p<0.001; Welch's t-test) suggesting that this region experienced positive selection and/or relaxed selection after this duplication event occurred. Comparison of the synonymous vs. non-synonymous substitution rate in three closely-related species pairs was also consistent with an increase in the rate of protein evolution in the duplicated region following the separation of nurf-1 into independent genes (Table 3).

Table 3

dN/dS ratio in three Caenorhabditis species pairs.

https://doi.org/10.7554/eLife.48119.036

		nurf-1–1 or nurf-1.b			nurf-1–2 or nurf-1.d
Sp. pair	Duplication^a	Dup. Reg.^b	Other^c	Ratio^d	Dup. Reg.^b	Other^c	Ratio^d
C. afra/ C. sulstoni	N	0.136^e	0.121	1.1	0.116^e	0.072	1.6
C. nigoni/ C. briggsae	Y	0.249	0.085	2.9	0.111	0.019	5.8
C. remanei/ C. latens	Y	0.295	0.121	2.4	0.177	0.048	3.7

^a Duplication indicates whether the species pair contain the duplicated exons that create two nurf-1 genes

^b Dup. Reg. indicates dN/dS was calculated using the region of the alignment that contains the duplication
^c Other indicates dN/dS was calculated using the region of the alignment that does not contain the duplication

^d Ratio was calculated by dividing the dN/dS value of the Dup. Reg. by the Other
^e The dN/dS values for the nurf-1.b and nurf-1.d in the duplicated region were different due to the b transcript encoding two additional amino acids in the 14^th exon (before the M initiation codon in the d isoform) and the amino acids encoded by the 16^th alternatively spliced exon.

Discussion

In this paper, we make use of CRISPR/Cas9-enabled gene editing to characterize the nurf-1 gene in C. elegans and then study the sequence and expression of nurf-1 orthologs in other Caenorhabditis species. The combination of genetics and evolutionary analysis allowed us to make a number of surprising observations. First, we show that an SNV in the 2^nd intron of nurf-1 that fixed in the N2 laboratory strain increases the fitness and fecundity of the N2 strain. Second, we show that the full-length isoform of nurf-1 has split into two essential, mostly non-overlapping isoforms with opposite effects on cell fate in differentiating gametes. Finally, we show that the B and D isoforms have split into separate genes in a subset of Caenorhabditis species. These data show that nurf-1 can be genetically perturbed to increase fitness of animals in new environments and has experienced long-term evolutionary changes that have split its function and regulation into two isoforms/genes (Figure 7A and B).

Figure 7

Download asset Open asset

Proposed antagonistic (Yin-Yang) working model of two *nurf-1* isoforms in *C. elegans*.

(A) Descriptive phylogeny with proposed major transitions in *nurf-1* isoform evolution. Each dot indicates the timepoint of a major *nurf-1* isoform evolution event. (B) Proposed molecular mechanism for NURF-1 isoforms. The NURF-1.B isoform interacts with ISWI through its DDT domain to form a NURF complex capable of remodeling chromatin at specific regions of the chromosome. NURF is recruited to these regions through interactions with specific transcription factors using protein domains encoded by the overlapping exons. This remodeling is necessary for transcriptional responses for spermatogenesis. Due to some unknown signal, after spermatogenesis has resulted in the production of ~ 300 sperm, the NURF-1 D isoform outcompetes the NURF complex away from its target loci, causing the loss of transcription of key spermatogenesis genes, resulting in gametogenesis transitioning from spermatogenesis to oogenesis. The binding affinity of PHD domains and bromodomainto histone strengthens this repression, but they are not completely necessary for the ability of the D isoform to outcompete the B isoform.

https://doi.org/10.7554/eLife.48119.037

Evolution of NURF-1/BPTF across phyla

In humans and Mus musculus, an abundance of evidence confirms that the long-form isoform of BPTF, which is orthologous to nurf-1, is the primary isoform in the NURF chromatin remodeling complex (Alkhatib and Landry, 2011). While a subset of BPTF exons are alternatively spliced, these events will not lead to the large changes in size we observe in the nurf-1 gene. One exception is the FAC1 isoform, which encompasses 801 N-terminal amino acids of BPTF (Bowser et al., 1995). While FAC1 is found in amyloid Alzheimer’s patients and enriched in the nervous system (Bowser et al., 1995; Landry et al., 2008), a biological role for this isoform has not been described. FAC1 is smaller and lacks conserved protein sequence found in the B isoform of nurf-1, suggesting an independent evolutionary origins and function.

In Drosophila, an intermediate state between humans and nematodes is found. Two major isoforms of NURF301 (the ortholog to nurf-1) have been described: a full-length form of NURF301 analogous to the full-length mammalian BPTF and an N-terminal form of NURF301 analogous to the NURF-1.B isoform of C. elegans. Both isoforms form NURF complexes and regulate gene expression (Kwon et al., 2009). Genetic analysis suggests that full-length NURF301 is required for gametogenesis in both sexes while the N-terminal isoform is required for regulation of pupation and innate immunity.

Nematodes have retained the N-terminal isoform but seem to have lost use of the full-length isoform for most biological traits (Andersen et al., 2006). Instead, they express two C-terminal isoforms (D and F) that appear to be a recent evolutionary innovation, likely occurring before the origin of the Caenorhabditis lineage. We show that the D isoform (or the Yin isoform) is essential in C. elegans, and seems to act in opposition to the B isoform (or the Yang isoform) to regulate the sexual fate of differentiating gametes. The requirement of two antagonistic isoforms (the B and D) for reproduction is reminiscent of the principle of Yin and Yang. Genetic pathways often include both positive and negative regulators of transcription and ultimate phenotype, however, rarely are both the factors encoded by the same genetic locus. While there is growing appreciation of isoform-specific regulation of many genes, nurf-1 appears to be unusually complex in this regard (although not unprecedented – see Müller and Basler, 2000; Berry et al., 2001; Wang et al., 2009).

We propose a molecular mechanism to explain the actions of the B and D isoforms to regulate transcription (Figure 7B). These two isoforms share 207 amino acids of protein sequence, which falls in a region that is thought to facilitate physical interactions with transcription factors (Alkhatib and Landry, 2011). NURF-1.B participates as part of the NURF complex, which is recruited to certain promoters by binding to transcription factors. At these loci, NURF promotes or represses expression of target genes by remodeling the chromatin surrounding promoters and gene bodies. For unknown molecular reasons, NURF-1.D preferentially binds to these transcription factors, displacing the NURF complex from these genomic regions, causing a change of chromatin state and gene expression.

Microevolution of NURF-1/BPTF

We showed that independent, beneficial alleles in nurf-1 were fixed in two laboratory strains of C. elegans that each experienced an extreme shift in environment from their natural habitats. The N2-derived SNV results in the change of a run of homopolymers in the 2^nd intron of the nurf-1.b transcript. Such a change could act as an enhancer for the nurf-1.d promoter, but the nature of the genetic change and position is more consistent with a role in regulating the nurf-1.b transcript. Analysis of RNA sequencing data did not identify any obvious changes in levels of any of the nurf-1 transcripts and it is unclear by what molecular mechanism it regulates nurf-1 activity. Potentially, it could increase pausing of the RNA polymerase at the homopolymer run or could regulate RNA splicing by changing the secondary structure of the RNA molecule. In general, such a mutation would not be predicted by most bioinformatic approaches to have a phenotypic effect. Only the low genetic diversity between the LSJ2 and N2 strains allowed us to focus on this variant, and eventually demonstrate this particular variant is causal.

The probability of two beneficial mutations happening in both lineages by random chance is quite small. Less than 300 genes (out of ~ 20,000 total) harbor derived mutations in either the N2 or LSJ2 lineage (McGrath et al., 2011). Only a handful of these fixed mutations are expected to be beneficial; our recent QTL mapping of fitness differences on agar plates only identified the nurf-1 locus (Zhao et al., 2019) and the small effective population sizes (~4–100) are expected to lead to the fixation of a number of nearly-neutral mutations through genetic drift and draft. Our work suggests nurf-1 is a genetic target for adaptation to the extreme changes in environments associated with laboratory growth.

Targeting of nurf-1 is consistent with its role as a regulator of life history tradeoffs. Many traits influence individual and offspring survival; however, the mapping of these traits onto fitness is thought to be dependent on the environmental niche an organism occupies. The LSJ2-derived deletion in nurf-1 modified life history tradeoffs to prioritize individual survival over reproduction; by shunting energy away from reproduction and growth, they increased their chances of surviving on poor, unnatural food. N2 animals grew on agar plates seeded with E. coli bacteria, which they can readily consume and metabolize into a useful energy source. In these conditions, survival is not the primary concern; each animal has three days to eat as much food as possible and produce as many progenies as possible to maximize the probability one of their offspring is transferred to the new food source. It is reasonable to think that the N2 and LSJ2 laboratory conditions represent opposite extremes along a life history axis encompassing individual survival and reproduction. The N2 mutation favors reproduction while the LSJ2 mutation favors survival.

In humans, genetic alterations in BPTF have been reported in several types of cancer and a role of BPTF in transcriptional regulation by c-MYC has been demonstrated, in agreement with its chromatin-binding function (Richart et al., 2016). Using well-characterized and validated antibodies against BPTF, we found several molecular species with unexpected electrophoretic mobilities in human cancer cells (Figure 2—figure supplement 4). Using mass-spectrometry, we confirmed the presence of multiple BPTF peptides in the bands detected by western blotting (Figure 2—figure supplement 4). These findings raise the possibility that these protein sequence variants have non-canonical functions. Given that stress adaptation is a hallmark of cancer - allowing tumor cells to survive and evolve following Darwinian selection processes - and the role of nurf-1 in C. elegans demonstrated here, it is tempting to speculate a role for such diversity of isoforms in the life histories of cancer cells. However, our work simply shows that additional forms of BPTF exist. Whether they have a biological role still needs to be determined.

Split of nurf-1 into separate genes potentially resolves conflict between the Yin and Yang isoforms caused by shared exons

In a clade of Caenorhabditis nematodes, the nurf-1 gene has split into two separate genes, an example of gene birth resulting in the duplication of a portion of the nurf-1 gene. Multigene families are common in most species and protein domains are often shuffled between genes. While the importance of gene duplication is not controversial, the exact mutational events and evolutionary forces responsible for the fixation of independent genes with different functions is less understood. Here we seem to have uncovered an example of how gene sharing, specifically through the creation of unique isoforms, can contribute to this process. In the lineage that led to the C. elegans species, nurf-1 first evolved changes in isoform use, resulting in the creation and essential action of the nurf-1.d transcript, and the loss of essentiality of the long nurf-1.a transcript. In this case, partitioning of the biological function and protein domains in each nurf-1 isoform created diversification of protein products.

What are the evolutionary forces responsible for the split of nurf-1 into two genes? One possibility is developmental system drift. Under this scenario, the separation of the two isoforms into two distinct genes does not signify any important evolutionary change in the function of the two genes. Neutral processes are responsible for the initial fixation of the duplication and the change does not provide any future evolutionary benefit.

However, there are a few additional possible ways adaptive evolution could play a role. First, correlated with the separation of the Yin and Yang transcripts into two genes is the loss of the full-length nurf-1.a and nurf-1.q transcripts. Loss of these transcripts could have provided a fitness benefit for animals. Consider, in order to produce both the nurf-1.b and nurf-1.d transcripts (i.e. the Yin and Yang transcripts) in the same cell, there must be a mechanism to distinguish between transcripts containing the 1^st to 15^th exons (the nurf-1.b transcripts) and transcripts initiating from the 14^th exon (the nurf-1.d transcripts). In the former case, the 15^th exon is spliced to the 16^th exon to terminate the transcript. In the latter case, the 15^th exon is spliced to the 17^th exon, along with the remaining 3' exons. Alternatively, the cell might not distinguish between transcripts, but uses each alternative splice site at a constant ratio (i.e. 80% of the time, the 15^th exon is spliced to the 16^th exon and 20% of the time, the 15^th exon is spliced to the 17^th exon). In the latter scenario, two additional transcripts must be produced. Intriguingly, these two transcripts match nurf-1.a and nurf-1.q, suggesting these transcripts are non-functional biproducts of molecular conflict between nurf-1.b and nurf-1.d. Potentially, production of the nurf-1.a and nurf-1.q transcripts could come at an energetic cost.

Multiple lines of evidence are consistent with the nurf-1.a and nurf-1.q transcripts playing non-biological roles. First, while the nurf-1.q transcript is produced at high levels, we were unable to observe its product in our immunoblots, suggesting that it is either not translated or the protein product is rapidly degraded. Second, our genetic tests were unable to identify a biological role for nurf-1.a. Third, we observe a loss of both the nurf-1.a and nurf-1.q transcripts in the species that have split nurf-1 into two genes. It would have been quite easy for these species to retain expression of nurf-1.q in their current configuration, either through a promoter in front of the 14^th exon in the nurf-1–1 gene, or an alternative stop exon after the 2^nd exon of the nurf-1–2 gene, since both of these elements existed in the ancestral state.

Second, duplication of the shared exons could facilitate future evolutionary change. Escape from adaptive conflict is a mechanism by which gene duplication can resolve the situation where a single gene is selected to perform multiple roles (Des Marais and Rausher, 2008). After duplication, each copy is free to improve its function independently. As organisms evolve, recruitment of NURF to specific loci could be accomplished by changing its binding with specific transcription factors through amino acid changes in NURF-1. The most rapidly evolving portion of the protein is within the 14^th and 15^th exons, suggesting positive selection acts on this region of the protein, potentially changing the transcription factors NURF-1 binds to. One issue that arises in species containing a single nurf-1 gene is the pleiotropy of genetic changes in the shared region; changing the amino acid sequence of the B isoform also changes the D isoform. Are there situations where modifying one isoform but not the other is preferred? In the clade of nematodes that have duplicated nurf-1, each gene is free to evolve independently. We present evidence that in these species, the duplicated region is free to evolve more rapidly. It should be interesting to characterize the exact function of this duplicated region and determine if these changes in protein sequence facilitate changes in transcription factor binding in an adaptive manner.

Conclusion

A fundamental problem in evolutionary biology is understanding the genetic mechanisms responsible for phenotypic diversity in extant species. Here, we present one route to address this problem. Experimental evolution and genetic analysis can be used to identify evolutionary relevant genes and understand their function. This knowledge can be leveraged to understand patterns of evolution of these genes in other species. We believe that merging genetics, genomics, and molecular evolution is a powerful approach to understand the evolutionary mechanisms responsible for long-term adaptation and species level differences.

Materials and methods

Key resources table

Reagent type or resource	Designation	Source of reference	Identifiers	Additional information
Gene (C. elegans)	nurf-1	WormBase	Wormbase ID: WBGene00009180	Sequence: F26H11.2
Gene (human)	BPTF	National Center for Biotechnology Information	Gene ID: 2186
Strain, strain background (E. coli)	OP50	Caenorhabditis Genetics Center (CGC)	RRID: WB-STRAIN:OP50
Strain (C. elegans)	CX12311	PMID: 21849976	RRID: WB-STRAIN:CX12311	Strain Background: N2, Request a strain: please email the corresponding author
Strain (C. elegans)	PTM66	PMID: 21849976		Strain Background: N2, Request a strain: please email the corresponding author
Strain (C. elegans)	PTM88	PMID: 21849976		Strain Background: N2, Request a strain: please email the corresponding author
Strain (C. elegans)	PTM288	PMID: 30328811	RRID: WB-STRAIN:PTM288	Strain Background: N2, Request a strain: please email the corresponding author
Strain (C. elegans)	PTM229	PMID: 30328811	RRID:WB-STRAIN:PTM229	Strain Background: N2, Request a strain: please email the corresponding author
Strain (C. elegans)	PTM98	This paper	RRID:WB-STRAIN:PTM98	Strain Background: N2, Request a strain: please email the corresponding author
Strain (C. elegans)	PTM113	This paper	RRID:WB-STRAIN:PTM113	Strain Background: N2, Request a strain: please email the corresponding author
Strain (C. elegans)	PTM116	This paper	RRID:WB-STRAIN:PTM116	Strain Background: N2, Request a strain: please email the corresponding author
Strain (C. elegans)	PTM117	This paper	RRID:WB-STRAIN:PTM117	Strain Background: N2, Request a strain: please email the corresponding author
Strain (C. elegans)	PTM118	This paper	RRID:WB-STRAIN:PTM118	Strain Background: N2, Request a strain: please email the corresponding author
Strain (C. elegans)	PTM167	This paper	RRID:WB-STRAIN:PTM167	Strain Background: N2, Request a strain: please email the corresponding author
Strain (C. elegans)	PTM170	This paper	RRID:WB-STRAIN:PTM170	Strain Background: N2, Request a strain: please email the corresponding author
Strain (C. elegans)	PTM189	This paper	RRID:WB-STRAIN:PTM189	Strain Background: N2, Request a strain: please email the corresponding author
Strain (C. elegans)	PTM203	This paper	RRID:WB-STRAIN:PTM203	Strain Background: N2, Request a strain: please email the corresponding author
Strain (C. elegans)	PTM211	This paper	RRID:WB-STRAIN:PTM211	Strain Background: N2, Request a strain: please email the corresponding author
Strain (C. elegans)	PTM316	This paper	RRID:WB-STRAIN:PTM316	Strain Background: N2, Request a strain: please email the corresponding author
Strain (C. elegans)	PTM317	This paper	RRID:WB-STRAIN:PTM317	Strain Background: N2, Request a strain: please email the corresponding author
Strain (C. elegans)	PTM319	This paper	RRID:WB-STRAIN:PTM319	Strain Background: N2, Request a strain: please email the corresponding author
Strain (C. elegans)	PTM322	This paper	RRID:WB-STRAIN:PTM322	Strain Background: N2, Request a strain: please email the corresponding author
Strain (C. elegans)	PTM325	This paper	RRID:WB-STRAIN:PTM325	Strain Background: N2, Request a strain: please email the corresponding author
Strain (C. elegans)	PTM332	This paper	RRID:WB-STRAIN:PTM332	Strain Background: N2, Request a strain: please email the corresponding author
Strain (C. elegans)	PTM354	This paper	RRID: WB-STRAIN:PTM354	Strain Background: N2, Request a strain: please email the corresponding author
Strain (C. elegans)	PTM371	This paper	RRID:WB-STRAIN:PTM371	Strain Background: N2, Request a strain: please email the corresponding author
Strain (C. elegans)	PTM372	This paper	RRID:WB-STRAIN:PTM372	Strain Background: N2, Request a strain: please email the corresponding author
Strain (C. elegans)	PTM373	This paper	RRID:WB-STRAIN:PTM373	Strain Background: N2, Request a strain: please email the corresponding author
Strain (C. elegans)	PTM376	This paper	RRID: WB-STRAIN:PTM376	Strain Background: N2, Request a strain: please email the corresponding author
Strain (C. elegans)	PTM416	This paper	RRID:WB-STRAIN:PTM416	Strain Background: N2, Request a strain: please email the corresponding author
Strain (C. elegans)	PTM417	This paper	RRID:WB-STRAIN:PTM417	Strain Background: N2, Request a strain: please email the corresponding author
Strain (C. elegans)	PTM420	This paper	RRID:WB-STRAIN:PTM420	Strain Background: N2, Request a strain: please email the corresponding author
Strain (C. elegans)	PTM487	This paper	RRID:WB-STRAIN:PTM487	Strain Background: N2, Request a strain: please email the corresponding author
Strain (C. elegans)	PTM489	This paper	RRID:WB-STRAIN:PTM489	Strain Background: N2, Request a strain: please email the corresponding author
Strain (C. elegans)	PTM512	This paper	RRID:WB-STRAIN:PTM512	Strain Background: N2, Request a strain: please email the corresponding author
Strain (C. elegans)	PTM517	This paper	RRID:WB-STRAIN:PTM517	Strain Background: N2, Request a strain: please email the corresponding author
Cell line (Human)	Colo-205	American Type Culture Collection (Rockville, MD)
Cell line (Human)	MCF-7	American Type Culture Collection (Rockville, MD)
Cell line (Human)	MDA-MB-231	American Type Culture Collection (Rockville, MD)
Cell line (Human)	Hela	American Type Culture Collection (Rockville, MD)
Cell line (Human)	A549	G. Roncador, CNIO
Sequence-based reagents (Plasmid)	Plasmid: pSM	Cori Bargmann Lab (Rockefeller University)
Sequence-based reagents (Plasmid)	Plasmid: pDD162 PrU6::dpy-10_sgRNA	PMID: 27467070		CRISPR/Cas9 gene editing
Sequence-based reagents (Plasmid)	Plasmid: pDD162 Preft3::Cas9	PMID: 27467070		CRISPR/Cas9 gene editing
Sequence-based reagents (Plasmid)	Plasmid: pCFJ90	Addgene		http://www.wormbuilder.org/test-page/about-mossci/
Sequence-based reagents (Plasmid)	Plasmid: pCFJ104	Addgene		http://www.wormbuilder.org/test-page/about-mossci/
Sequence-based reagents (Plasmid)	Plasmid: pCFJ151	Addgene		http://www.wormbuilder.org/test-page/about-mossci/
Sequence-based reagents (Plasmid)	Plasmid: pCFJ601	Addgene		http://www.wormbuilder.org/test-page/about-mossci/
Antibody	(mouse monoclonal) anti-HA	Life Technologies	Cat. No.: 326700	(1:500)
Antibody	(mouse monoclonal) anti-DYKDDDDK	Life Technologies	Cat. No.: MA191878	(1:1000)
Antibody	(mouse monoclonal) anti-FLAG	Millipore Sigma	Cat. No.: F3165	(1:1000)
Antibody	Horseradish peroxidase-conjugated secondary antibodies	Dako Glostrup		(1:10000)
Peptide, recombinant protein	BPTF	Novus Biologicals	Cat. No.: NB100-41418
Peptide, recombinant protein	Vinculin	Sigma	Cat. No.: V9131
Sequence-based reagents (Oligonucleotide)	dpy-10 (cn64)	PMID: 25161212		CRISPR/Cas9 gene editing
Commercial assay, kit	Taqman probe:dpy-10 (kah82/kah83)	Thermal: Custom TaqMan SNP Genotyping Assays	PTM09
Commercial assay, kit	NEB Q5 site directed mutagenesis kit	NEB	Cat. No.: E0554
Commercial assay, kit	Next Ultra II Directional RNA Library Prep Kit	NEB	Cat. No.: E7760S
Commercial assay, kit	Zymo DNA isolation kit	Zymo	Cat. No.: D4071
Commercial assay, kit	Zymo DNA cleanup kit	Zymo	Cat. No.: D4064
Commercial assay, kit	ddPCR Supermix for Probes	BIORAD	Cat. No.: 1863010
Commercial assay, kit	Droplet Generation Oils	BIORAD	Cat. No.: 1863005
Commercial assay, kit	ddPCR Droplet Reader Oil	BIORAD	Cat. No.: 1863004
Commercial assay, kit	VECTASHIELD antifade Mounting Medium with DAPI	VECTOR	Cat. No.: H-1200
Software, Algorithm	edgeR	PMID: 19910308	RRID:SCR_012802	Opensource: https://bioconductor.org/packages/release/bioc/html/edgeR.html
Software, Algorithm	SARtools	PMID: 27280887	RRID:SCR_016533	Opensource: https://github.com/PF2-pasteur-fr/SARTools
Software, Algorithm	IGV	PMID: 21221095	RRID:SCR_011793	https://software.broadinstitute.org/software/igv/
Software, Algorithm	Kallisto	PMID: 27043002	RRID:SCR_016582	https://pachterlab.github.io/kallisto/
Software, Algorithm	HISAT2	PMID: 25751142	RRID:SCR_015530	https://ccb.jhu.edu/software/hisat2/index.shtml
Software, Algorithm	Samtools	PMID: 19505943	RRID:SCR_002105	http://samtools.sourceforge.net/
Software, Algorithm	Jalview	PMID: 19151095	RRID:SCR_006459	http://www.jalview.org/
Software, Algorithm	MAFFT	PMID: 23329690	RRID:SCR_011811	https://mafft.cbrc.jp/alignment/software/
Software, Algorithm	IQ-Tree	PMID: 25371430	RRID:SCR_017254	http://www.iqtree.org
Software, Algorithm	ITOL	PMID: 27095192		https://itol.embl.de/
Software, Algorithm	PAL2NAL	PMID: 16845082

Strains

The following strains were used in this study:

Share this article

Cite this article

An N2-derived genetic variant in the intron of nurf-1 increases fitness in laboratory conditions.

Figure 1—source data 1

Summary of major nurf-1 transcripts identified in C. elegans.

nurf-1 encodes five major transcripts.

An additional isoform besides NURF-1.

Figure 3—source data 1

Predicted effect of stop codon mutations on NURF-1 isoforms.

Genetic analysis suggests the NURF-1.

Figure 4—source data 1

NURF-1.B and NURF-1.D have opposite effects on the sperm-to-oocyte switch in hermaphrodites.

Figure 5—source data 1

A duplication of the shared exons of the nurf-1.b and nurf-1.d transcripts resulted in the split of nurf-1 it into two separate genes in a subclade of Caenorhabditis species.

Figure 6—source data 1

dN/dS ratio in three Caenorhabditis species pairs.

Proposed antagonistic (Yin-Yang) working model of two nurf-1 isoforms in C. elegans.

Author details

Wen Xu

Contribution

Competing interests

Lijiang Long

Contribution

Competing interests

Yuehui Zhao

Contribution

Competing interests

Lewis Stevens

Contribution

Competing interests

Irene Felipe

Contribution

Competing interests

Javier Munoz

Contribution

Competing interests

Ronald E Ellis

Contribution

Competing interests

Patrick T McGrath

Contribution

For correspondence

Competing interests

Citations by DOI

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

Categories and tags

Research organism