Noncanonical usage of stop codons in ciliates expands proteins with structurally flexible Q-rich motifs

Chi-Ning Chuang; Hou-Cheng Liu; Tai-Ting Woo; Ju-Lan Chao; Chiung-Ya Chen; Hisao-Tang Hu; Yi-Ping Hsueh; Ting-Fang Wang

doi:10.7554/eLife.91405.1

eLife assessment

This study presents potentially valuable results on glutamine-rich motifs in relation to protein expression and alternative genetic codes. The author's interpretation of the results is so far only supported by incomplete evidence, due to a lack of acknowledgment of alternative explanations, missing controls and statistical analysis and writing unclear to non experts in the field. These shortcomings could be at least partially overcome by additional experiments, thorough rewriting, or both.

https://doi.org/10.7554/eLife.91405.1.sa2

Significance of findings

valuable: Findings that have theoretical or practical implications for a subfield

landmark
fundamental
important
valuable
useful

Strength of evidence

incomplete: Main claims are only partially supported

exceptional
compelling
convincing
solid
incomplete
inadequate

During the peer-review process the editor and reviewers write an eLife assessment that summarises the significance of the findings reported in the article (on a scale ranging from landmark to useful) and the strength of the evidence (on a scale ranging from exceptional to inadequate). Learn more about eLife assessments

Abstract

Serine(S)/threonine(T)-glutamine(Q) cluster domains (SCDs), polyglutamine (polyQ) tracts and polyglutamine/asparagine (polyQ/N) tracts are Q-rich motifs found in many proteins. SCDs often are intrinsically disordered regions that mediate protein phosphorylation and protein-protein interactions. PolyQ and polyQ/N tracts are structurally flexible sequences that trigger protein aggregation. We show that four SCDs and three prion-causing Q/N-rich motifs of yeast proteins possess autonomous protein expression-enhancing activities. Comparative Gene Ontology (GO) analyses of the near-complete proteomes of 27 representative model eukaryotes reveal that Q-rich motifs prevail in proteins involved in specialized biological processes, including Saccharomyces cerevisiae RNA-mediated transposition, Candida albicans filamentous growth, ciliate peptidyl-glutamic acid modification, Tetrahymena thermophila xylan catabolism and meiosis, Dictyostelium discoideum development and sexual cycles, Plasmodium falciparum infection, and the Drosophila melanogaster nervous system. We also show that Q-rich motifs are expanded massively in ten ciliates with reassigned TAA^Q and TAG^Q codons. Our results provide new insights to explain why many ciliates reassign their nuclear stop codons into glutamine (Q). The consequence of this preponderance of Q is massive expansion of proteins harboring three structurally flexible or even intrinsically disordered Q-rich motifs. Since these Q-rich motifs can endow proteins with structural and functional plasticity, we suggest that they represent useful toolkits for evolutionary novelty.

Introduction

Ciliated protists (ciliates) are a large and diverse group of microbial eukaryotes that are important to a wide range of biological studies and applications. Relative to most other eukaryotes, the nuclear genomes of many ciliates possess three interesting features. First, ciliates often have two types of nuclei. Their diploid micronucleus (MIC) carries the cell germline, the genetic material of which is inherited via sexual reproduction and meiosis. The polyploid macronucleus (MAC) or vegetative nucleus provides nuclear RNA for vegetative growth. MAC is generated from the MIC by massive amplification, editing and rearrangement of the genome (see reviews in (1, 2)). Second, ciliate meiosis is remarkable relative to that of other studied sexual eukaryotes. A hallmark of meiosis is programmed initiation of DNA double-strand breaks (DSBs) by Spo11, a meiosis-specific topoisomerase VII subunit A endonuclease (3, 4). Meiotic DSBs are then repaired in a homology-dependent manner via searching, pairing (or synapsis) and exchange of homologous parental chromosomes, leading to the formation of interhomolog crossovers (COs). Ultimately, the resulting recombinant chromosomes equally segregate into haploid gametes. To facilitate pairing and recombination between homologous chromosomes, telomeres attach to the nuclear envelope (NE) at the beginning of meiosis and then cluster to one side of the NE to form a transient structure called the “bouquet”. In many sexual eukaryotes, homology pairing usually culminates in the synaptonemal complex (SC), a zipper-like proteinaceous structure that appears to regulate CO formation. The SC behaves like a liquid crystal, tightly connecting paired homologous chromosomes from telomere to telomere and also incorporating lateral component loops in interstitial chromosome segments (5, 6). Most studied ciliates lack SC or only have degenerate SCs (7). In Tetrahymena thermophila, the most intensively studied ciliate, meiotic MICs undergo extreme elongation (by ∼50-fold) and form proteinaceous condensates called “crescents”. Within these elongated crescents, telomeres and centromeres of all meiotic chromosomes are rearranged at opposing ends in a stretched bouquet-like manner. Meiotic pairing and recombination take place within the crescents (see review in (8)). It has been reported that ATR1 (Ataxia Telangiectasia Mutated 1), an evolutionarily conserved DNA damage senor protein kinase, senses Spo11-induced DSBs and triggers the elongation of MICs (9). Meiosis-specific CYC2 and CYC17 cyclins, as well as cyclin-dependent kinase CDK3, are required to initiate meiosis and for crescent assembly (10-12). CYC2/CDK2 promotes bouquet formation in MIC via controlling microtubule-directed elongation (12) and also controls gene expression of proteins involved in DSB formation (SPO11), DNA repair (COM1, EXO1, DMC1), and CO formation (HOP2, MND1, MSH4, MSH5, ZPH3, BIM1, and BIM2) (13). The DPL2/E2fl1 complex, a meiosis-specific transcription factor, promotes transcriptional induction of DNA repair proteins and chromosome-associated structural proteins, including MRE11, COM1, EXO1, RAD50, RAD51, SMC1, SMC2, SMC3, SMC4, REC8, ESP1, and CNA1, among others (13). Nevertheless, the molecular mechanisms underlying crescent assembly and disassembly remain poorly understood. Third, many ciliates reassign their standard stop codons into amino acids (Table 1). For example, several ciliates possess two noncanonical nuclear genetic codes (UAA^Q and UAG^Q), in which the UAA and UAG stop codons have been reassigned to glutamine (Q) so that UGA is the sole functional stop codon in many ciliates, including Tetrahymena thermophila, Paramecium tetraurelia, Paramecium bursaria, Oxytricha trifallax, Stylonychia lemnae, Pseudocohnilembus persalinus, Aristerostoma sp., Favella ehrenbergii, Pseudokeronopsis spp., Strombidium inclinatum, and Uronema spp. Both the UAA and UAG stop codons are reassigned to tyrosine (Y) in Favella ehrenbergii, whereas the UGA stop codon is translated to tryptophan (W) or cysteine (C) in Blepharisma japonicum and Euplotes focardii, respectively. In contrast, Stentor coeruleus, Climacostomum virens, Litonotus pictus and Protocruzia adherens utilize the universal set of genetic codons. Notably, Condylostoma magnum and Parduzcia sp. have no dedicated genetic codes. Their UAA, UAG and UGA codons can be stop codons or translated to Q, C and W, respectively. Translation termination at the mRNA 3′ end occurs in a context-dependent manner to distinguish stop from sense (14-26).

Although it has been reported previously that Q is used more frequently in Tetrahymena thermophila and Paramecium tetraurelia than in other species (19, 20), many important questions regarding stop codon reassignment in ciliates remain unresolved. For instance, fundamentally, it is unclear if Q, Y, W and C are used more frequently in other ciliates in which stop codons are reassigned. Moreover, whether there are common or specific structural motif(s) in proteins arising from stop codon reassignment is not clear. Furthermore, what are the structural and functional impacts of such genome-wide alterations? Since high-quality transcriptomic and gene annotation data are now available for many ciliate and non-ciliate eukaryotes and genetic codons have also been properly defined for a wide variety of them (Table 1), we were confident we could provide important insights into these interesting topics.

In this report, first we present an unexpected result that the Q-rich motifs of several Saccharomyces cerevisiae proteins possess autonomous protein expression-enhancing (PEE) activities (27), including four serine(S)/threonine(T)-Q cluster domains (SCDs) and three prion-forming domains (PFDs) with polyglutamine (polyQ) or polyglutamine/asparagine (polyQ/N) tracts. Notably, all seven of these Q- or Q/N-rich motifs display high Q, S, T, and/or N contents. A common feature of intrinsically disordered regions (IDRs) is high content of Q, N, S, T, proline (P), glycine (G) and charged amino acids. Rather than adopting stable secondary and/or tertiary structures, IDRs are structurally flexible polypeptides in their native states (28-30). Next, we performed Gene Ontology (GO) enrichment analyses on all proteins having Q-rich motifs (i.e., SCDs, polyQ and polyQ/N), as well as those with the homorepeat (polyX) motifs of other amino acid residues, in 20 non-ciliate and 17 ciliate species. Our results support that Q-rich motifs are not only massively expanded in the 10 ciliate species we examined with reassigned TAA^Q and TAG^Q codons, but they also prevail in proteins involved in several species-specific or even phylum-specific biological processes.

Results

SCDs provide versatile functionalities in proteins

We reported previously that the NH₂-terminal domain (NTD; residues 1–66) of budding yeast Saccharomyces cerevisiae Rad51 protein (Rad51-NTD) autonomously promotes high-level production of native Rad51 and its COOH-terminal fusion protein LacZ (β-galactosidase) in vivo (27). In brief, we expressed Rad51-LacZ-NVH fusion proteins using a CEN-ARS plasmid (low-copy number) under the control of the native RAD51 gene promoter (P_RAD51) (Table 1). The NVH tag contains an SV40 nuclear localization signal (NLS) peptide preceding a V5 epitope tag and a hexahistidine (His₆) affinity tag (27). We confirmed that N-terminal addition of Rad51-NTD to LacZ-NVH increased both steady-state levels of LacZ-NVH fusion proteins (Figure 1A) and β-galactosidase activities in vivo (Figure 1B).

The Q-rich domains of seven different yeast proteins possess autonomous expression-enhancing (PEE) activities.
(A-B) N-terminal fusion of Rad51-NTD/SCD, Rad53-SCD1, Hop1-SCD, Sml1-NTD, Sup35-PND, Ure2-UPD and New1-NPD promotes high-level expression of LacZ-NVH, respectively. The NVH tag contains an SV40 nuclear localization signal (NLS) peptide preceding a V5 epitope tag and a hexahistidine (His₆) affinity tag (27). Western blots for visualization of LacZ-NVH fusion proteins (A) and quantitative β-galactosidase assays (B) were carried out as described previously (27). Error bars indicate standard deviation between experiments (n ≥ 3). Asterisks indicate significant differences relative to wild type (WT) in A or lacking an NTD in B, with P values calculated using a two-tailed t-test (***, P value <0.001; **, P value <0.01). (C-D) The PEE activities of S/T/Q/N-rich domains are independent of the quaternary structures of target proteins. (C) Rad53-SCD1 can be used as an N-terminal fusion tag to enhance production of four different target proteins: LacZ-NVH, GST-NVH, GSTnd-NVH and GFP-NVH. (D) Visualization of native Rad51 (NTD-Rad51-ΔN), Rad51-ΔN, and the Rad51-ΔN fusion proteins by immunoblotting. Hsp104 was used as a loading control. Size in kilodaltons of standard protein markers is labeled to the left of the blots. The black arrowhead indicates the protein band of Rad51-ΔN. (E) MMS sensitivity. Spot assay showing five-fold serial dilutions of indicated strains grown on YPD plates with or without MMS at the indicated concentrations (w/v).

Rad51-NTD is a SCD because it has three SQ motifs (S²Q, S¹²Q and S³⁰Q) (27). The S/T-Q motifs, comprising S or T followed by Q, are the target sites of DNA damage senor protein kinases, i.e., ATM, ATR (ATM and RAD3-related) and DNA-dependent protein kinase (DNA-PK) (31, 32). Mec1 (Mitotic Entry Checkpoint 1) and Tel1 (TELomere maintenance 1) are the budding yeast homologs of mammalian ATR and ATM, respectively. Budding yeast lacks a DNA-PK homolog (33-35). We showed that the three SQ motifs of Rad51-NTD are phosphorylated by Mec1 and Tel1 during vegetative growth and meiosis. Mec1/Tel1-dependent NTD phosphorylation antagonizes Rad51 degradation via the proteasomal pathway, increasing the half-life of Rad51 from ∼30 min to ≥180 min (27). This outcome is consistent with the notion that Mec1 exhibits an essential function in regulating protein homeostasis (proteostasis) in S. cerevisiae (36, 37).

A unifying definition of an SCD is possession of three or more S/T-Q sites within a stretch of ≤50 or ≤100 amino acids in S. cerevisiae and human, respectively (31, 32). One of the best-understood mechanisms of SCD phosphorylation involves their association with binding partners hosting a forkhead-associated (FHA) domain. For example, Mec1 and Tel1 phosphorylate the SCD1 of Rad53 (residues 1-29) and the SCD of Hop1 (residues 258-324), respectively, specifically recruiting and activating their downstream DNA damage checkpoint kinases Dun1 and Mek1 during vegetative growth and meiosis (38-40). Sml1, an inhibitor of ribonucleotide reductase, is involved in regulating Rad53 and the production of deoxynucleoside triphosphate (dNTP) during DNA replication and the DNA damage response (DDR) (41). The Sml1 protein in the SK1 strain harbors three S/T-Q motifs (S⁴Q, S¹⁴Q and T⁴⁷Q), whereas that in the S288c strain only has one SQ motif (S⁴Q, C¹⁴Q and T⁴⁷M). Here, we further show that yeast Rad53-SCD1, Hop1-SCD, Sml1-NTD^1– ⁵⁰ (residues 1–50) and Sml1-NTD^1–27 (residues 1–27) also exhibit PEE activities (Figure 1A and 1B, Table 1).

The Q-rich motifs of three yeast prion-causing proteins also exhibit PEE activities

Since Sml1-NTD^1–27 in the SK1 strain only harbors two S/T-Q motifs (S⁴Q and S¹⁴Q), the number of S/T-Q motifs alone could not account for PEE activity. Notably, Rad51-NTD, Rad53-SCD1, Hop1-SCD, Sml1-NTD^1-27 and Sml1-NTD^1-50 all represent Q- or Q/N-rich motifs. Rad51-NTD contains 9 serines (S), 2 threonines (T), 9 glutamines (Q) and 4 asparagines (N). Rad53-SCD1 has 2S, 4T, 7Q and 1N. Hop1-SCD has 6S, 6T, 8Q and 9N. Sml1-NTD^1–27 and Sml1-NTD^1–50 in SK1 possess 3S and 5S, 2T and 3T, 6Q and 7Q, as well as 2N and 3N, respectively.

We investigated if other Q- or Q/N-rich motifs in yeast can also promote protein expression in vivo. PolyQ and polyQ/N tracts are the most common homorepeats acting as structurally flexible motifs for protein aggregation or protein–protein interactions in many eukaryotes (42, 43). PolyN is not as structurally flexible as polyQ due to a stronger propensity for β-turn formation in polyN than in polyQ (44). In so-called polyQ-associated diseases, long Q-, Q/N- or even N-rich motifs result in an excess of interactions, resulting in malfunctional or pathogenic protein aggregates (45). Many prion-causing proteins contain Q/N-rich prion forming domains (PFDs). In S. cerevisiae, the best-characterized prion-causing proteins are Sup35 (or translation terminator eRF35), New1 ([NU+] prion formation protein 1), Ure2 (uridosuccinate transport 2), Rnq1 (rich in N and Q 1) and Swi1 (switching deficient 1) (46, 47). We found that the Q/N-rich NTDs of Sup35, Ure2 and New1 also display PEE activities, i.e., the prion nucleation domain (PND; residues 1-39) of Sup35, the Ure2 prion domain (UPD) (residues 1-91) and the New1 prion domain (NPD; residue 1-146) (Figure 1A and 1B, Table 1). Sup35-PND containing 3S, 12Q, 18N and an S¹⁷Q motif exerts critical functions in promoting [PSI+] prion nucleation (48). The UPD of the Ure2 nitrogen catabolite repression transcriptional regulator is the basis of the prion [URE3⁺] (49, 50). The UPD is critical for Ure2’s function in vivo because its removal in the corresponding Ure2-ΔUPD mutants elicits reduced protein stability and steady-state protein levels (but not transcript levels) (51). Ure2-UPD contains 10S, 5T, 10Q and 33N, adopting a completely disordered structure (52). New1 is a non-essential ATP-binding cassette type F protein that fine-tunes the efficiency of translation termination or ribosome recycling (53). The NPD of New1 supports [NU⁺] and is susceptible to [PSI+] prion induction (54, 55). New1-NPD contains 19S, 8T, 14Q, 28N and an S¹⁴⁵Q motif. Here, we applied the LacZ-NVH fusion protein approach to show that N-terminal fusion of Sup35-PND, Ure2-UPD or New1-NPD to LacZ-NVH all increased steady-state protein levels (Figure 1A) and β-galactosidase activities in vivo (Figure 1B).

The PEE function is not affected by the quaternary structures of target proteins

We found that N-terminal fusion of Rad53-SCD1 to four different NVH-tagged target proteins (Figure 1C) or Rad51-ΔN (Figure 1D) all resulted in higher protein production in vivo. LacZ is a tetrameric protein, glutathione S-transferase (GST) is dimeric, and non-dimerizing GST (GSTnd) and GFP are monomeric proteins. As reported recently (27), removal of the NTD from Rad51 reduced by ∼97% the protein levels of corresponding Rad51-ΔN proteins relative to wild type (WT) (Figure 1D), leading to lower resistance to the DNA damage agent methyl methanesulfonate (MMS) (Figure 1E). Interestingly, the autonomous PEE function of Rad51-NTD could be fully rescued in Rad51-ΔN by N-terminal fusion of Rad53-SCD1, Rad53-SCD1-5STA (all five S/T-Q motifs changed to AQs) or Sup35-PND, respectively. Compared to wild-type yeast cells, the three corresponding yeast mutants (rad51-SCD1-rad51-ΔN, rad51-SCD1-5STA-rad51-ΔN and sup35-PND-rad51-ΔN) (Table 2) not only produced similar steady-state levels of Rad51- ΔN fusion proteins (Figure 1D), but they also exhibited high MMS resistance (Figure 1E). During homology directed repair of DNA double strand breaks (DSBs), Rad51 polymerizes into helical filaments on DSB-associated single-stranded DNA (ssDNA) and then promotes homologous search and strand exchange of the ssDNA-protein filament with a second double-stranded DNA (dsDNA). We inferred that the catalytic activity of Rad51- ΔN in HDRR is likely similar to that of wild-type Rad51 because the weak MMS-resistant phenotype of rad51-ΔN is mainly due to very low steady-state levels of Rad51-ΔN (Figure 1D). Accordingly, we concluded that the quaternary structures of the target proteins (i.e., GFP, GSTnd, GST, LacZ and Rad51-ΔN) are irrelevant to the autonomous PEE activity.

*S. cerevisiae* strains used in this study

The autonomous PEE function is not likely controlled by plasmid copy number or its transcription

We also show that the PEE function is unlikely to operate at the transcriptional level, as revealed by genomic and reverse-transcription quantitative polymerase chain reactions (i.e., g-qPCR and RT-qPCR, respectively) (Figure 2 and Table 3). We found that addition of WT and mutant Rad51-NTD to LacZ-NVH not only did not affect the average copy number of the corresponding CEN-ARS plasmids in exponentially growing S. cerevisiae cells (Figure 2A), but also even reduced the steady-state transcript levels of the corresponding LacZ-NVH fusion protein genes (Figure 2B). Therefore, addition of Rad51-NTD to LacZ-NVH did not result in a significant increase in transcription.

The autonomous protein-expression-enhancing function of Rad51-NTD is unlikely to be controlled during transcription or simply arise from plasmid copy number differences.
The effects of WT and mutant Rad51-NTD on β-galactosidase activities (A), plasmid DNA copy numbers (B), relative steady-state levels of LacZ-NVH mRNA normalized to *ACT1* (actin) mRNA (C), and relative ratios of LacZ-NVH mRNA *versus* plasmid DNA copy number (D). The wild-type yeast cells were transformed with indicated CEN-ARS plasmids, respectively, to express WT and mutant Rad51-NTD-LacZ-NVH fusion proteins or LacZ-NVH alone under the control of the native RAD51 gene promoter (P_RAD51). The relative quantification (RQ = 2^-ΔΔϹT) values were determined to reveal the plasmid DNA copy number and steady-state levels of LacZ-NVH mRNA by g-qPCR and RT-qPCR, respectively. LacZ and *ACT1* were selected as target and reference protein-encoding genes, respectively, in both g-qPCR and RT-qPCR. The data shown represent mean ± SD from three independent biological data-points.

The oligonucleotide primers used for g-qPCR and RT-qPCR

The protein quality control system moderately regulates autonomous PEE activities

The protein quality control system is a mechanism by which cells monitor proteins to ensure that they are appropriately folded (56). In the current study, we compared the protein steady-state levels (Figure 3A) and β-galactosidase activities (Figure 3B-3D) of Rad51-NTD-LacZ-NVH and LacZ-NVH in WT, hsp104Δ, new1Δ, doa1Δ, doa4Δ, san1Δ and oaz1Δ yeast cell lines. The protein products encoded by each of the six genes deleted from the latter mutant lines are all functionally relevant to protein homeostasis or prion propagation. Hsp104 is a heat-shock protein with disaggregase activities that disrupts protein aggregation (57, 58). New1 is a translation factor that fine-tunes ribosome recycling and the efficiency of translation termination (53). Doa1 (also called Ufd3) is an ubiquitin- and Cdc48-binding protein with a role in ubiquitin homeostasis and/or protein degradation (59, 60). The doa1Δ mutant exhibits diminished formation of [PSI+] prion (61). Doa4 is a deubiquitinating enzyme required for recycling ubiquitin from proteasome-bound ubiquitinated intermediates (62). The doa4Δ mutant exhibits increased sensitivity to the protein synthesis inhibitor cycloheximide (63). San1 is an ubiquitin-protein ligase that targets highly aggregation-prone proteins (64, 65). Oaz1 (ornithine decarboxylase antizyme) stimulates ubiquitin-independent degradation of Spe1 ornithine decarboxylase by the proteasome (66). We found that the β-galactosidase activities of Rad51-NTD-LacZ-NVH in WT and all six of the gene-knockout strains we examined were 10-29-fold higher than those of LacZ-NVH (Figure 3B). Intriguingly, the β-galactosidase activities of LacZ-NVH in the six gene-knockout mutants are all lower (30-70%) than those in WT (Figure 3C). In contrast, the β-galactosidase activities of Rad51-NTD-LacZ-NVH in WT are either slightly higher or lower than those in the six null mutants (Figure 3D). These results indicate that addition of Rad51-NTD to LacZ-NVH can abrogate the protein homeostasis defects caused by loss of each of these six genes. For example, Rad51-NTD might compensate for the ribosome assembly and translation defects in new1Δ (53), as well as the cycloheximide-hypersensitive phenotype displayed by doa4Δ (63). Accordingly, the β-galactosidase activities of Rad51-NTD-LacZ-NVH in the new1Δ and doa4Δ lines are higher than those in the WT, respectively. In contrast, the β-galactosidase activities of LacZ-NVH in the new1Δ and doa4Δ lines are lower, respectively, than those of WT. Finally, although the doa1Δ mutant is defective in [PSI+] prion formation (61), the steady-state levels of Rad51-NTD-LacZ-NVH in the doa1Δ line are also slightly higher than those in WT.

The expression-promoting function of Rad51-NTD is controlled during protein translation and does not affect ubiquitin-mediated protein degradation.
(A) The steady-state protein levels of Rad51-NTD-LacZ-NVH and LacZ-NVH in WT and six protein homeostasis gene knockout mutants. (B-D) The impact of six protein homeostasis genes on the β-galactosidase activity ratios of Rad51-NTD-LacZ-NVH to LacZ-NVH in WT and the six gene knockout mutants (B). The β-galactosidase activities of LacZ-NVH (C) and Rad51-NTD-LacZ-NVH (D) in WT and the six gene knockout mutants are shown. Asterisks indicate significant differences, with P values calculated using a two-tailed t-test (***, P value <0.001; **, P value <0.01; *, P value <0.05).

The relationship between PEE function, amino acid contents and structural flexibility

Using three Q-rich motifs (i.e., Rad51-NTD, Rad53-SCD1 and Sup35-NPD) as examples, we observed a very strong positive relationship between STQ and STQN amino acid percentages and β-galactosidase activities (Figure 4 and Figure 5). These results are consistent with the notion that, due to high STQ or STQN content, SCDs or Q-rich motifs are intrinsically disordered regions (IDRs) in their native states, rather than adopting stable secondary and/or tertiary structures (31). A common feature of IDRs is their high content of S, T, Q, N, proline (P), glycine (G) and charged amino acids (28-30).

Relative
β**-galactosidase (LacZ) activities are correlated with the percentage STQ or STQN amino acid content of three Q-rich motifs.** (A) List of N-terminal tags with their respective length, numbers of S/T/Q/N amino acids, overall STQ or STQN percentages, and relative β-galactosidase activities. (B-D) Linear regressions between relative β-galactosidase activities and overall STQ or STQN percentages for Rad51-NTD (B), Rad53-SCD1 (C) and Sup35-PND (D). The coefficients of determination (R²) are indicated for each simple linear regression. (E) The amino acid sequences of wild-type and mutant Rad51-NTD, Rad51-SCD1 and Sup35-PND, respectively.

Alanine scanning mutagenesis of IDRs. The amino acid sequences of WT and mutant IDRs are listed in Table S1. Total protein lysates prepared from yeast cells expressing Rad51-NTD-LacZ-NVH (A), Sup35-PND-LacZ-NVH (B) or Rad53-SCD1-LacZ-NVH (C) were visualized by immunoblotting with anti-V5 antisera. Hsp104 was used as a loading control. Quantitative yeast β-galactosidase (LacZ) assays were carried out as described in Figure 1. Error bars indicate standard deviation between experiments (n = 3). Asterisks indicate significant differences when compared to LacZ-NVH with p values calculated with a two-tailed t-test (**, P value <0.01 and ***, P value <0.001).

It is important to note that the threshold of STQ or STQN content varies in the three cases presented herein (Figure 4B). Thus, the percentage of STQ or STQN residues is not likely the only factor contributing to protein expression levels. Since G, P and glutamate (E) are enriched by >10% in Rad51-NTD, Rad53-SCD1 and Sup35-NPD, these three amino acids may also contribute to the PEE activities and structural flexibility of these three Q-rich motifs.

Functionally, IDRs are key components of subcellular machineries and signaling pathways because they have the potential to associate with many partners due to their ability to adopt multiple metastable conformations. Many IDRs are regulated by alternative splicing and post-translational modifications, and some are involved in the formation of various membraneless organelles or proteinaceous condensates via intracellular liquid-liquid phase separation (67, 68). Since IDRs can endow proteins with structural and functional plasticity, we hypothesized that Q-rich motifs (e.g., SCD, polyQ and polyQ/N) represent useful toolkits for creating new diversity during protein evolution.

Comparative GO enrichment analyses of SCDs and polyX motifs in different eukaryotes

To verify our hypothesis, we designed two JavaScript software programs (ASFinder-SCD and ASFinder-polyX) for proteome-wide searches of amino acid sequences that contain ≥3 S/T-Q motifs within a stretch of ≤100 residues (32) and the polyX motifs of 20 different amino acids, respectively. In the latter case, diverse thresholds have been used in different studies or databases to define and detect polyX motifs. We applied the lowest threshold (≥4/7), i.e., a minimum number of four identical X amino acid residues in a localized region of seven amino acid residues (69-72) (Figures 6-9 and Dataset DS1-DS29). We searched and compared the near-complete proteomes of 27 different eukaryotes (Table 4), including the budding yeast S. cerevisiae, three pathogenic species of Candida, three filamentous ascomycete fungi (Neurospora crassa, Magnaporthe oryzae and Trichoderma reesei), three basidiomycete fungi (Cryptococcus neoformans, Ustilago maydis and Taiwanofungus camphoratus), the slim mold Dictyostelium discoideum, the malaria-causing unicellular protozoan parasite Plasmodium falciparum, six unicellular ciliates (Tetrahymena thermophila, Paramecium tetraurelia, Oxytricha trifallax, Stylonychia lemnae, Pseudocohnilembus persalinus and Stentor coeruleus), the fly Drosophila melanogaster, the mosquito Aedes aegypti, the nematode Caenorhabditis elegans, the zebrafish Danio rerio, the mouse Mus musculus, Homo sapiens, the higher plant Arabidopsis thaliana, and the single-celled green alga Chlamydomonas reinhardtii. The Benchmarking Universal Single-Copy Ortholog (BUSCO) scores of the near-universal single-copy gene orthologs of all 27 proteomes are 92.4-100% (Table 4). Genome or protein matrix scores >95% for model organisms are generally deemed complete reference genomes or proteomes (73).

Percentages of proteins with different numbers of SCDs, and polyQ, polyQ/N or polyN tracts in 37 different eukaryotes.

Usage frequency (%) of standard codons [stop codon (*), Q, C, Y and W] and reassigned stop codons (→ Q, → C or → W) in 37 different eukaryotes

It was reported previously that SCDs are overrepresented in the yeast and human proteomes (32, 74), and that polyX prevalence differs among species (43, 75-77). Our results reveal that the percentages of SCD proteins in the near-complete proteomes of 21 non-ciliate species and 6 ciliates range from 8.0% in P. falciparum, 13.9% in H. sapiens, 16.8% in S. cerevisiae, 24.2% in U. maydis, to a maximum of 58.0% in O. trifallax (Figure 6 and Dataset DS2). Among the 6050 proteins in the most recently modified S. cerevisiae reference proteome (https://www.uniprot.org/proteomes/UP000002311), we identified 1016 SCD-hosting proteins (Dataset DS4), including all 436 SCD-harboring proteins previously revealed by ScanProsite (32). ScanProsite is a publicly available database of protein families, domains and motifs (78).

The percentages of polyQ proteins in the near-complete proteomes of the 21 non-ciliate species and 6 ciliates range from 0.6% (S. coeruleus), 2.1% (P. falciparum), 4.1% (S. cerevisiae), 6.5% (H. sapiens), 21.8% (D. melanogaster), 24.5% (D. discoideum) to a maximum of 47.2% in Pseudocohnilembus persalinus. Those of polyQ/N proteins range from 9.5% in T. camphoratus, 11.9% in H. sapiens, 28.2% in S. cerevisiae, 62.5% in D discoideum, 63.7% in P. falciparum, to a maximum of 79.4% in P. persalinus. Percentages of polyN proteins range from 0.9% in T. camphoratus, 1.0% in H. sapiens, 1.1% in M. musculus, 3.7% in S. coeruleus, 49.7% in D. discoideum, to a maximum of 54.0% in P falciparum (Dataset DS2). Compared to other species, D. melanogaster and A. aegypti possess higher percentages of polyH proteins; P. falciparum has higher percentages of poly-phenylalanine (polyF) and polyY proteins; and C. reinhardtii has higher percentages of polyA, polyP, polyG and polyH proteins.

Notably, average usage of a given amino acid (X) is not always positively correlated with the percentages of polyX proteins in different species. For instance, proteome-wide Q usages are 2.8% in P. falciparum, 3.7% in S. coeruleus, 5.4% in D. melanogaster and 10.9% in P. persalinus. The proteome-wide usage of F in P. falciparum (4.4%) is not significantly different from that of other species, ranging from 2.0% in C. reinhardtii to 5.2% in T. thermophila. Average usages of P (7.7%), A (17.1%) and G (11.5%) in C. reinhardtii, as well as the average usages of N (14.4%) and Y (5.7%) in P. falciparum, are higher than those in 6 ciliate and the 20 other non-ciliate species, respectively (Figure 6 and Dataset DS1). These results indicate that distinct evolutionary strategies have evolved to enable specific polyX motifs to accumulate in different species.

Next, we performed comparative GO enrichment analyses using information on the functions of genes provided by GO knowledgebase (http://geneontology.org). Our results provide two intriguing insights into evolutionary biology. First, SCDs (Table 2) and polyQ/N motifs (Table 6) exist in many proteins involved in various evolutionarily conserved processes (Dataset DS4-DS29). It was reported previously that many SCD proteins in yeast and human are neither targets of ATR or ATM nor are related to pathways of functions relevant to the DDR or DNA replication, including telophase, cytokinesis, protein transport, endocytosis and cytoskeleton (32). Second, SCDs (Table 5) and polyQ/N motifs (Table 6) are overrepresented in the specialized biological processes of different species. Of particular note is an overrepresentation of conserved SCD proteins (74) and polyQ/N proteins (this study) in pathways related to vertebrate neural development and neurodegeneration. For instance, human CTTNBP2 (1663 amino acid residues) is a neuron-specific F-actin associated SCD-protein that is involved in the formation and maintenance of dendritic spines and it is associated with autism spectrum disorders (79, 80). Human CTTNBP2 possesses ten S/T-Q motifs (T⁴⁶⁶Q, T⁴⁹³Q, S¹⁵⁸⁰Q, S⁵⁵³Q, S⁶³⁴Q, S⁹⁹⁴Q, S¹³⁹²Q, S¹⁵⁸⁰Q, T¹⁶²¹Q and S¹⁶²⁴Q). Mouse CTTNBP2 has 630 amino acid residues, four S/T-Q motifs (S⁴¹⁹Q, T⁴⁶³Q, S⁵⁵⁰Q, S⁶²⁴Q), and it shares high amino acid identity to the N-terminus (1-640 amino acid residues) of human CTTNBP2. Using IUPred2A (https://iupred2a.elte.hu/plot_new), a web-server for identifying disordered protein regions (81), we found that both human CTTNBP2 (220-1633 residues) and mouse CTTNBP2 (220-630 residues) are Q/N-rich IDRs with high percentages of S, T, Q, N, G, P, R and K. We reported recently that mouse CTTNBP2 forms self-assembled condensates through its C-terminal IDR and it facilitates co-condensation of an abundant excitatory postsynaptic scaffold protein SHANK3 at dendritic spines in a Zn²⁺-dependent manner (82). Our results also reveal that SCDs, as well as polyQ, polyQ/N, polyS and polyP tracts, are overrepresented in the nervous system of D. melanogaster (Tables 5 and 6). The numbers (and percentages) of SCD-, polyQ-, polyQ/N-, polyS- and polyP-containing proteins, respectively, in different nerve-related biological processes are: regulation of axonogenesis [101 (87%), 93 (80%), 98 (84%), 103 (89%) and 100 (86%)], neuron development [97 (80%), 99 (82%), 102 (84%), 103 (85%) and 88 (73%)], axon guidance [199 (53%), 209 (56%), 278 (74%), 280 (75%), and 186 (50%)], mushroom body development [140 (66%), 142 (67%), 167 (79%), 161 (76%) and 109 (51%)], dendritic morphogenesis [96 (53%), 95 (52%), 132 (73%), 131 (72%) and 61 (34%)], and peripheral nervous system development [94 (73%), 100 (78%), 111 (86%), 106 (82%) and 90 (70%)] (Table 5, Table 6 and Dataset DS22).

The numbers of SCD proteins involved in different biological processes of six representative eukaryotes

The numbers of polyQ/N proteins involved in different biological processes of six representative eukaryotes

We also found SCDs, polyQ/N tracts and a few other structurally flexible polyX tracts that are overrepresented in specialized biological processes of various species. First, in S. cerevisiae, 92 proteins are involved in RNA-mediated transposition (GO ID: 32197), among which 82 and 88 are SCD and polyQ/N proteins, respectively (Table 5, Table 6 and Dataset DS4). The 82 SCD (also polyQ/N) proteins are all structural components of the virus-like particles (VLPs), which form the shell that encapsulates the Ty1 (Transposon-yeast 1) and Ty2 retrotransposon dimeric RNA genomes. These capsid proteins also possess nucleocapsid-like chaperone activity, promoting primer tRNA(i)-Met annealing to the multipartite primer-binding site, dimerization of Ty RNA, and initiation of reverse transcription (83). Another example is the developmental and sexual cycles of D discoideum, which are induced by starvation and/or other stimuli. During the developmental cycle, these amoebae secrete small molecules (e.g., cyclic adenosine monophosphate or cAMP) to induce intercellular communication (chemotaxis), ultimately resulting in aggregation of hundreds to hundreds of thousands of amoebae into multicellular fruiting bodies (sorocarps). These sorocaps consist of ∼80% asexual spores and ∼20% dead stalk cells. During the sexual cycle, two amoebae of complementary mating types fuse to form a zygote. The zygote and the surrounding amoebae form a thick multilayered structure called the macrocyst, representing the site of meiosis and interhomolog DNA recombination, enabling the generation of hundreds of recombinant haploid amoebae (84, 85). We found that SCDs (Table 5) and polyQ/N tracts (Table 6) are overrepresented in many proteins involved in a diversity of biological processes during the D. discoideum development and sexual cycles, including chemotaxis to cAMP, amoebae aggregation, sorocarp morphogenesis, sorocarp development, and sexual reproduction (Dataset DS14). A third example is the human malaria-causing parasite P. falciparum. Infection in humans begins upon being bitten by an infected female Anopheles mosquito. After entering the human bloodstream, sporozoites from the mosquito’s salivary gland quickly invade hepatocytes and undergo asexual multiplication, leading to tens of thousands of merozoites bursting from the liver cells. These merozoites then invade red blood cells (erythrocytes) and undergo an additional round of multiplication. The mature merozoites can change the surface properties of infected red blood cells, causing them to stick to blood vessels (a process called cytoadherence), resulting in obstructed microcirculation and multiple organ dysfunction. The success of P. falciparum in establishing persistent infections is mainly attributable to immune evasion through antigenic variation (86). Our GO enrichment results reveal that SCDs, as well as polyQ/N, polyN, polyS, polyT, polyD, polyG and polyK tracts, are overrepresented in many of these specialized infection processes, including antigen variation, modulation of host erythrocyte aggregation, cytoadherence to the microvasculature, and cell-cell adhesion (Table 5, Table 6 and Dataset DS15). For example, the numbers of proteins involved in antigen variation are 38 (SCDs), 82 (polyQ/N), 37 (polyN), 51 (polyS), 49 (polyT), 42 (polP), 69 (polyG), 40 (polyD), 68 (polyE) and 144 (polyK), respectively. These examples are consistent with the notion that S, T, Q, N, P, G and charged amino acids are used more frequently in IDRs (28-30).

Massive expansion of Q-rich motifs in ciliates having two reassigned stop codons (UAA^Q and UAG^Q)

The most striking finding in our study is that, due to their usage of the two noncanonical codons (UAA^Q and UAG^Q), Q (but not S, T or N) is used more frequently in five free-moving unicellular ciliates (T. thermophila, P. tetraurelia, O. trifallax, S. lemnae and P. persalinus) than in the sessile ciliate S. coeruleus or in any of the 21 non-ciliate species we examined herein (Figures 6 and Dataset DS1). We refer hereafter to these five free-moving unicellular ciliates as ‘group I’ ciliates. The proteome-wide Q and N contents, as well as the percentages of SCD, polyQ, polyQ/N and polyN proteins, respectively, in the different ciliates are: T. thermophila (9.6%, 9.2%, 50.8%, 33.0%, 72.6%, 25.2%), P. tetraurelia (9.1%, 6.9%, 41.1%, 27.1%, 62.9% and 8.7%), O. trifallax (9.0%, 7.9%, 58.0%, 36.7%, 72.3%, 22.0%), S. lemnae (8.4%, 7.5%, 49.2%, 28,1%, 65.3%, 17.4%), P. persalinus (10.9%, 9.8%, 47.7%, 47.2%, 79.5%, 36.3%), and S. coeruleus (3.7%, 6.3%, 13.9%, 0.6%, 18.7% and 3.7%] (Figure 6-9, Dataset DS2 and DS3).

Our GO enrichment results also reveal that the percentages of SCD, polyQ and polyQ/N (but not polyN) proteins in many biological processes of the five group I ciliates are significantly higher than those in S. coeruleus. For example, the numbers (and percentages) of SCD, polyQ, polyQ/N and polyN proteins involved, respectively, in DNA repair (GO ID: 06281) are: T. thermophila [28 (61%), 18 (39%), 38 (83%) and 13 (28%)] (DS16); P. tetraurelia [27 (44%), 21 (34%), 49 (79%) and 3 (5%)] (DS17); O. trifallax [50 (61%), 35 (43%), 62 (76%) and 22 (27%)] (DS18); S. lemnae [32 (53%), 21 (35%), 44 (73%) and 19 (17%)] (DS19); P. persalinus [19 (53%), 19 (53%), 31 (86%) and 10 (28%)] (DS20); and S. coeruleus [7 (17%), 1 (2%), 10 (24%) and 2 (5%)] (DS22). The respective numbers (and percentages) of SCD, polyQ, polyQ/N and polyN proteins involved in microtubule-based movement (GO ID: 07018) are: T. thermophila [93 (85%), 69 (63%), 107 (97%) and 63 (57%)] (DS16); P. tetraurelia [155 (69%), 109 (48%), 195 (86%) and 47 (21%)] (DS17); O. trifallax [99 (78%), 85 (66%), 121 (95%) and 47 (37%)] (DS18); S. lemnae [84 (80%), 66 (63%), 100 (95%) and 50 (48%)] (DS19); P. persalinus [65 (72%), 63 (70%), 85 (94%) and 41 (46%)] (DS20); and S. coeruleus [78 (34%), 5 (2%), 84 (37%) and 13 (6%)] (DS21).

Compared to 20 non-ciliate species, all five group I ciliates and the sessile S. coeruleus possess proteins involved in peptidyl-glutamic acid modification (GO ID:18200), with respective numbers (and percentages) of SCD, polyQ, polyQ/N and polyN proteins being: T. thermophila [38 (78%), 31 (63%), 47 (96%) and 23(47%)] (Dataset DS16); P. tetraurelia [42 (60%), 23 (33%), 58 (83%) and 5 (7%)] (Dataset DS17); O. trifallax [81 (91%), 63 (71%), 84 (94%) and 50 (56%)] (Dataset DS18); S. lemnae [63 (86%), 54 (74%), 68 (93%) and 34 (47%)] (Dataset DS19); P. persalinus [38 (58%), 37 (57%), 55 (85%) and 28 (43%)] (Dataset DS20); and S. coeruleus [30 (23%), 0 (0%), 50 (38%) and 23 (17%)] (Dataset DS21).

T. thermophila has 58 proteins involved in the xylan catabolic process (GO ID: 45493) (Table 7), of which 56 (97%) are SCDs, 55 (95%) are polyQ proteins, 58 (100%) are polyQ/N proteins, and 49 (84%) are polyN proteins (Dataset DS16). Using the NCBI BLASTP search tool with an expect value (E-value) ≤ 10e-5 to search for homologs of these 58 proteins among all other 16 ciliates analyzed in this study, only 144 proteins with amino acid identity <60% and a raw alignment score of <150 were found (Dataset DS30).

The numbers of polyQ proteins of different biological processes in six ciliate species

In T. thermophila, there are 124 annotated proteins involved in various events during meiosis, including meiosis initiation and regulation, homologous pairing, formation of DNA DSBs, formation of CO and chiasma (a structure that forms between a pair of homologous chromosomes by CO recombination and physically links the homologous chromosomes during meiosis), as well as meiotic nuclear divisions (DS31) (see review in (8)). Among the 124 T. thermophila meiotic proteins, we identified 85 SCD proteins, 54 polyQ proteins, 106 polyQ/N proteins, 32 polyN proteins, 32 polyS proteins and 48 polyK proteins, respectively. Notably, there are 48 and 59 meiotic proteins that contain ≥4 SCDs and/or ≥4 polyQ/N tracts, respectively. For instance, DPL2, CYC2, CYC17 and ATR1 each contains 15, 6, 8 and 5 SCDs, 2, 0, 1 and 2 polyQ tracts, as well as 11, 4, 4 and 5 polyQ/N tracts, respectively. Pars11, a chromatin-associated protein required for inducing and limiting Spo11-induced DSBs, has 14 SCDs, 2 polyQ tracts and 6 polyQ/N tracts. Spo11-induced DSBs promote ATR1-dependent Pars11 phosphorylation and its removal from chromatin (13). Many T. thermophila meiotic DSB repair proteins also harbor several SCDs, polyQ tracts and/or polyQ/N tracts, including MSH4, MSH5, SGS1, FANCM, REC8, and ZPH3, among others. Several T. thermophila proteins involved in editing and rearrangement of the MIC genome also harbor multiple Q-rich motifs. PDD1 (programmed DNA degradation 1), a conjugation-specific HP1-like protein, has 13 SCDs, 3 polyQ tracts and 6 polyQ/N tracts. Mutations in the chromodomain or the chromoshadow domain of PDD1 were found previously to elicit PDD1 mislocalization, prevented histone H3 dimethylation on K9, abolished removal of internal eliminated sequences (IES), and/or resulted in the production of inviable progeny (87). DCL1 (Dicer-like 1) has 5 SCDs, 1 polyQ tract and 8 polyQ/N tracts. DCL1 is required for processing the MIC transcripts to siRNA-like scan (scn) RNAs, as well as for methylation of histone H3 at Lys 9. This latter modification occurs specifically on sequences (IESs) to be eliminated (88). GIW1 (gentlemen-in-waiting 1) physically directs a mature Argonaute-siRNA complex to the MIC nucleus, thus promoting programmed IES elimination (89). GIW1 has 9 SCDs and 2 polyQ/N tracts (Dataset DS31). Using IUPred2A (https://iupred2a.elte.hu/plot_new), we found that the Q-rich motifs in all these meiotic proteins are intrinsically disordered. Accordingly, it is of interest to speculate that, like the C-terminal IDR of mammalian CTTNBP2, the Q-rich motifs in T. thermophila meiotic proteins might form tunable proteinaceous condensates to regulate assembly and disassembly of the “crescents” in ciliate MICs.

Finally, we performed comparative GO analyses for the SCD and polyX proteins encoded by the transcriptomes of 11 different ciliate species. These transcripts were originally generated as part of the Marine Microbial Eukaryote Transcriptome Sequencing Project (MMETSP) (90), which were then reassembled and reannotated by Brown and colleagues (91). All transcripts are publicly available at Zendo (https://zenodo.org/record/1212585#.Y79zoi2l3PA) (91). We applied TransDecoder (Find Coding Regions Within Transcripts; https://github.com/TransDecoder/TransDecoder/wiki) to identify candidate coding regions within the transcript sequences. Five of those eleven ciliates have reassigned UAA^Q and UAG^Q codons (hereafter termed “group II ciliates”), i.e., Aristerostoma spp., Favella ehrenbergii, Pseudokeronopsis spp., Strombidium inclinatum, and Uronema spp.. Like sessile S. coeruleus, group III ciliates (Climacostomum virens, Litonotus pictus and Protocruzia adherens) possess the standard genetic codes. Group IV ciliates encompass Favella ehrenbergii, Blepharisma japonicum and Euplotes focardii, each of which utilizes the reassigned codons UAA^Y, UAG^Y, UGA^W and UGA^C, respectively (Table 1). The proteins encoded by these MMESTP transcripts are unlikely to represent the entire protein complement of all 11 ciliate species because 1) many MMETSP transcripts are not intact (i.e., broken mRNAs) and thus encode incomplete protein sequences, and 2) except for C. virens (94.7%), the BUSCO protein scores of these MMESTP transcripts only range from 52.6% to 89.9% (Table 1). Nevertheless, our results indicate that Q is used more frequently in group I and group II ciliates than in group III and group IV ciliates or in a further 20 non-ciliate species (Figure 3 and DS1). Accordingly, proportions of SCD, polyQ and polyQ/N proteins in all group I and group II ciliates are higher than they are in the three group III ciliates (except L. pictus) and the three group IV ciliates, respectively (Figure 4-6). Since N is used more frequently in L. pictus than in group II ciliates, as well as in the other group III and group IV ciliates, it has higher percentages of polyN and polyQ/N proteins (Figures 6-9 and Dataset DS1).

Our data also indicates that Y, W or C are not used more frequently in the three group IV ciliates than in the other 14 ciliate or 20 non-ciliate species (Figure 7 and Dataset DS1). Reassignments of stop codons to Y, W or C also do not result in higher percentages of polyY, polyW or polyC proteins in the three group IV ciliates, respectively (Figure 9).

Proteome-wide contents of 20 different amino acids in 37 different eukaryotes.

Percentages of proteins with indicated polyQ and polyQ/N tracts in 37 different eukaryotes.

Percentages of proteins with indicated polyX motifs in 37 different eukaryotes.

Discussion

We present two unexpected results in this report. First, the Q-rich motifs of several yeast proteins (Rad51-NTD, Rad53-SCD1, Hop1-SCD, Sml1-NTD, Sup35-PND, Ure2-UPD and New1-NPD) all exhibit autonomous PEE activities. These structurally flexible Q-rich motifs have useful potential for applications in both basic research (e.g., synthetic biology) and biotechnology. It is of interest to investigate further how these Q-rich motifs exert this PEE function in yeast and whether Q-rich motifs in other eukaryotes also possess similar PEE activities. Second, reassignment of stop codons to Q in the group I and group II ciliates significantly increase proteome-wide Q usage, leading to massive expansion of structurally flexible or even intrinsically disordered Q-rich motifs. In contrast, reassignments of stop codons to Y, W or C do not result in higher usages of these three amino acid residues nor higher percentages of W-, Y- or C-rich proteins in the three group IV ciliates, respectively. These results are consistent with the notion that, unlike for Q, the Y, W and C residues are not common in IDRs (28-30).

Due to their structural flexibility, Q-rich motifs can endow proteins with structural and functional plasticity. Based on our findings from this study, we suggest that Q-rich motifs are useful toolkits for generating novel diversity during protein evolution, including by enabling greater protein expression, protein-protein interactions, posttranslational modifications, increased solubility, and tunable stability, among other important traits. This speculation may explain three intriguing phenomena. First, due to higher Q usage, many proteins involved in evolutionarily conserved biological processes in group I and group II ciliates display more diverse amino acid sequences than the respective proteins in other ciliate or non-ciliate species. Accordingly, it is sometimes difficult to identify authentic protein homologs among different ciliates, particularly for group I and group II ciliates. We highlight the example of the 58 proteins involved in xylan catabolysis in T. thermophila (Dataset DS16). Second, our GO enrichment results reveal that Q-rich motifs or a few other structurally flexible polyX tracts prevail in proteins involved in specialized biological processes, including S. cerevisiae RNA-mediated transposition, C. albicans filamentous growth, ciliate peptidyl-glutamic acid modification, T. thermophila meiosis and xylan catabolism, D. discoideum sorocarp (fruiting body) development and sexual reproduction, P. falciparum blood cell infection and antigen variation, and development of the D. melanogaster nervous system. In theory, structurally flexible Q-rich motifs might form various membraneless organelles or proteinaceous condensates via intracellular liquid-liquid phase separation in tunable manners, including protein posttranslational modification, protein-protein interaction, protein-ligand binding, et al., (67, 68). A typical example is that the C-termini of mouse CTTNBP2 facilitates co-condensation of CTTNBP2 with SHANK3 at the postsynaptic density in a Zn²⁺-dependent manner (82). Third, the Borg genomes of anaerobic methane-oxidizing archaea are huge, linear extrachromosomal elements. A striking feature of Borgs is pervasive tandem direct repeat (TR) regions. While we were preparing this manuscript, it was reported that TRs in ORFs are under very strong selective pressure, leading to perfect amino acid TRs (aaTRs) that are commonly IDRs. aaTRs particularly often contain disorder-promoting amino acids, including Q, N, P, T, E, K, V, D and S, respectively (92). Accordingly, distinct evolutionary strategies have evolved in different species to evolve protein regions that are structurally flexible or even intrinsically disordered. Further investigations are needed to determine if liquid-liquid phase separation prevails in ciliates with reassigned TAA^Q and TAG^Q codons and/or in the specialized biological processes in various species we have described herein.

Conclusions

One of the most interesting questions in genome diversity is why many ciliates reassign their nuclear stop codons into amino acids, e.g., glutamine (Q), tyrosine (Y), tryptophan (W) or cysteine (C). However, the impacts of such genome-wide alternations are not well understood. We show that glutamine (Q) is used more frequently in all 10 ciliate species possessing reassigned TAA^Q and TAG^Q codons than in other ciliates and non-ciliate species. The consequence of this preponderance of Q is massive expansion of proteins harboring three structurally flexible or even intrinsically disordered Q-rich motifs. Since Q-rich motifs can endow proteins with structural and functional plasticity, we also show that Q-rich motifs prevail in proteins involved in several species-specific or even phylum-specific biological processes. Our results indicate that Q-rich motifs are useful toolkits for evolutionary novelty.

Methods

All plasmids, yeast strains and PCR primers used in this study are listed in the Tables 1-3, respectively. Guinea pig antisera against Rad51, and rabbit antisera against phosphorylated Rad51-S¹²Q, phosphorylated Rad51-S³⁰Q, and phosphorylated Hop1-T³¹⁸Q were described previously (27, 40). The mouse anti-V5 antibody was purchased from BioRad (CA, USA). The rabbit anti-Hsp104 antiserum was kindly provided by Chung Wang (Academia Sinica, Taiwan). Rabbit antisera against phosphorylated Sup35-S¹⁷Q were raised using the synthetic phosphopeptide N¹²YQQYS^(P)QNGNQQQGNNR²⁸ as an antigen, where S^(P) is phosphorylated serine. Phosphopeptide synthesis and animal immunization were conducted by LTK BioLaboratories, Taiwan. Western blotting analyses were performed as described previously (27, 40). Quantitative β-galactosidase activity assays were carried out as previously described (27, 93). The ASFinder-SCD and ASFinder-polyX software are publicly available at Github (https://github.com/tfwangasimb/AS-SCD-finder). The sources of proteome and transcript datasets are described in Table 4. The proteome-wide amino acid contents, the numbers of protein with different types of Q-rich motifs and polyN, as well as the SCD and polyX proteins in all 37 different eukaryotes are listed in Datasets DS1-DS3, respectively. The Gene Ontology (GO) enrichment analyses were performed using publicly available data in the GO Resource (http://geneontology.org). The GO identities (ID) of different biological processes and the names of all SCD and polyX proteins in the 24 near-completed eukaryotic proteomes are listed in Dataset DS4-DS29, respectively.

Supporting information

Datasets

Supporting Datasets

Dataset DS1: The average usage of 20 amino acids in 17 ciliate and 20 non-ciliate species.

Dataset DS2: The number proteins containing different types of polyQ, polyQ/N and polyN tracts in 17 ciliate and 20 non-ciliate species.

Dataset DS3: The numbers and percentages of SCD and polyX proteins in 17 ciliate and 20 non-ciliate species.

Dataset DS4-DS29: GO enrichment analyses revealing the SCD and polyX proteins involved in different biological processes in 6 ciliate and 20 non-ciliate species.

Dataset DS30: The results of BLASTP searches using the 58 Tetrahymena thermophila proteins involved in xylan catabolysis.

Dataset DS31: The list of 124 Tetrahymena thermophila proteins involved in meiosis (kindly provided by Josef Loidl). The numbers of SCD and polyX tract in each protein are indicated.

Dataset DS32: The raw qPCR data of cDNA and gDNA in Figure 2.

Declarations

Ethics approval and consent to participate

All experiments were approved by the Ethics Committee of Academia Sinica, Taiwan.

Consent for publication

All authors have read and approved of its submission to this journal.

Availability data and materials

The data analyzed in this study were listed in Table 4. Other data generated in this study were available within the supporting information.

Competing interests

The authors declare no conflicts of interests.

Funding

Institute of Molecular Biology, Academia Sinica, Taiwan, Republic of China, and National Science and Technology Council, Taiwan, Republic of China [NSTC 110-2811-B-001-542 to TFW], Taiwan, Republic of China.

Authors’ contributions

HCL, CNC, TTW, YPH, HTH, JLC and TFW performed the experiments and analyzed the data. TFW and YPH conceived and designed the experiments. TFW wrote the paper. All of the authors read and approved the manuscript.

Acknowledgements

We thank John O′Brien for English editing, G. Titus Brown (0000-0001-6001-2677) for his help in accessing the reassembled transcriptomic dataset in Zendo, Yu-Tang Huang (IMB Computer Room) for maintaining the computer workstation, Josef Loid (Max Perutz Labs, University of Vienna, Austria) for providing the list of 124 proteins involved in T. thermophila meiosis, Meng-Chao Yao (IMB, Academia Sinica, Taiwan) for his suggestion to include the MMESTP transcripts of 11 different ciliates in this study.

Authors’ information

Hou-Cheng Liu: hc666@gate.sinica.edu.tw Chi-Ning Chuang: chining@gate.sinica.edu.tw Tai-Ting Woo: asanwoo@gmail.com

Hisao-Tang Hu: annebest999@gmail.com Chiung-Ya Chen: chiungya@gate.sinica.edu.tw Ju-Lan Chao: julan@gate.sinica.edu.tw

Yi-Ping Hsueh: yph@gate.sinica.edu.tw

Ting-Fang Wang: tfwang@gate.sinica.edu.tw

References

1.
1. Prescott DM
1994The DNA of ciliated protozoaMicrobiol Rev 58:233–67Google Scholar
2.
1. Chalker DL
2. Yao MC
2011DNA elimination in ciliates: transposon domestication and genome surveillanceAnnu Rev Genet 45:227–46Google Scholar
3.
1. Keeney S
2. Giroux CN
3. Kleckner N
1997Meiosis-specific DNA double-strand breaks are catalyzed by Spo11, a member of a widely conserved protein familyCell 88:375–84Google Scholar
4.
1. de Massy B
2013Initiation of meiotic recombination: how and where? Conservation and specificities among eukaryotesAnnu Rev Genet 47:563–99Google Scholar
5.
1. Zickler D
2006From early homologue recognition to synaptonemal complex formationChromosoma 115:158–74Google Scholar
6.
1. Rog O
2. Kohler S
3. Dernburg AF
2017The synaptonemal complex has liquid crystalline properties and spatially regulates meiotic recombination factorsElife 6Google Scholar
7.
1. Chi JY
2. Mahe F
3. Loidl J
4. Logsdon J
5. Dunthorn M
2014Meiosis gene inventory of four ciliates reveals the prevalence of a synaptonemal complex-independent crossover pathwayMolecular Biology and Evolution 31:660–72Google Scholar
8.
1. Loidl J
2021Tetrahymena meiosis: Simple yet ingeniousPLoS Genet 17:e1009627Google Scholar
9.
1. Loidl J
2. Mochizuki K
2009Tetrahymena meiotic nuclear reorganization is induced by a checkpoint kinase-dependent response to DNA damageMol Biol Cell 20:2428–37Google Scholar
10.
1. Yan GX
2. Dang H
3. Tian M
4. Zhang J
5. Shodhan A
6. Ning YZ
7. et al.
2016Cyc17, a meiosis-specific cyclin, is essential for anaphase initiation and chromosome segregation in Tetrahymena thermophilaCell Cycle 15:1855–64Google Scholar
11.
1. Yan GX
2. Zhang J
3. Shodhan A
4. Tian M
5. Miao W
2016Cdk3, a conjugation-specific cyclin-dependent kinase, is essential for the initiation of meiosis in Tetrahymena thermophilaCell Cycle 15:2506–14Google Scholar
12.
1. Xu J
2. Li X
3. Song W
4. Wang W
5. Gao S
2019Cyclin Cyc2p is required for micronuclear bouquet formation in Tetrahymena thermophilaSci China Life Sci 62:668–80Google Scholar
13.
1. Zhang J
2. Yan G
3. Tian M
4. Ma Y
5. Xiong J
6. Miao W
2018A DP-like transcription factor protein interacts with E2fl1 to regulate meiosis in Tetrahymena thermophilaCell Cycle 17:634–42Google Scholar
14.
1. Caron F
2. Meyer E
1985Does Paramecium primaurelia use a different genetic code in its macronucleus?Nature 314:185–8Google Scholar
15.
1. Helftenbein E
1985Nucleotide sequence of a macronuclear DNA molecule coding for alpha-tubulin from the ciliate Stylonychia lemnae. Special codon usage: TAA is not a translation termination codonNucleic Acids Res 13:415–33Google Scholar
16.
1. Horowitz S
2. Gorovsky MA
1985An unusual genetic code in nuclear genes of TetrahymenaProc Natl Acad Sci U S A 82:2452–5Google Scholar
17.
1. Preer JR Jr
2. Preer LB
3. Rudman BM
4. Barnett AJ
1985Deviation from the universal code shown by the gene for surface protein 51A in ParameciumNature 314:188–90Google Scholar
18.
1. Lozupone CA
2. Knight RD
3. Landweber LF
2001The molecular basis of nuclear genetic code change in ciliatesCurr Biol 11:65–74Google Scholar
19.
1. Ring KL
2. Cavalcanti AR
2008Consequences of stop codon reassignment on protein evolution in ciliates with alternative genetic codesMol Biol Evol 25:179–86Google Scholar
20.
1. Salim HM
2. Ring KL
3. Cavalcanti AR
2008Patterns of codon usage in two ciliates that reassign the genetic code: Tetrahymena thermophila and Paramecium tetraureliaProtist 159:283–98Google Scholar
21.
1. Dohra H
2. Fujishima M
3. Suzuki H
2015Analysis of amino acid and codon usage in Paramecium bursariaFEBS Lett 589:3113–8Google Scholar
22.
1. Xiong J
2. Wang G
3. Cheng J
4. Tian M
5. Pan X
6. Warren A
7. et al.
2015Genome of the facultative scuticociliatosis pathogen Pseudocohnilembus persalinus provides insight into its virulence through horizontal gene transferSci Rep 5Google Scholar
23.
1. Swart EC
2. Serra V
3. Petroni G
4. Nowacki M
2016Genetic codes with no dedicated stop codon: Context-dependent translation terminationCell 166:691–702Google Scholar
24.
1. Heaphy SM
2. Mariotti M
3. Gladyshev VN
4. Atkins JF
5. Baranov PV
2016Novel ciliate genetic code variants including the reassignment of all three stop codons to sense codons in Condylostoma magnumMol Biol Evol 33:2885–9Google Scholar
25.
1. Slabodnick MM
2. Ruby JG
3. Reiff SB
4. Swart EC
5. Gosai S
6. Prabakaran S
7. et al.
2017The macronuclear genome of Stentor coeruleus reveals tiny introns in a giant cellCurr Biol 27:569–75Google Scholar
26.
1. Kollmar M
2. Muhlhausen S
2017Nuclear codon reassignments in the genomics era and mechanisms behind their evolutionBioessays 39Google Scholar
27.
1. Woo TT
2. Chuang CN
3. Higashide M
4. Shinohara A
5. Wang TF
2020Dual roles of yeast Rad51 N-terminal domain in repairing DNA double-strand breaksNucleic Acids Res 48:8474–89Google Scholar
28.
1. Romero P
2. Obradovic Z
3. Li X
4. Garner EC
5. Brown CJ
6. Dunker AK
2001Sequence complexity of disordered proteinProteins 42:38–48Google Scholar
29.
1. Macossay-Castillo M
2. Marvelli G
3. Guharoy M
4. Jain A
5. Kihara D
6. Tompa P
7. et al.
2019The balancing act of intrinsically disordered proteins: enabling functional diversity while minimizing promiscuityJ Mol Biol 431:1650–70Google Scholar
30.
1. Uversky VN
2. Gillespie JR
3. Fink AL
2000Why are “natively unfolded” proteins unstructured under physiologic conditions?Proteins 41:415–27Google Scholar
31.
1. Traven A
2. Heierhorst J
2005SQ/TQ cluster domains: concentrated ATM/ATR kinase phosphorylation site regions in DNA-damage-response proteinsBioessays 27:397–407Google Scholar
32.
1. Cheung HC
2. San Lucas FA
3. Hicks S
4. Chang K
5. Bertuch AA
6. Ribes-Zamora A
2012An S/T-Q cluster domain census unveils new putative targets under Tel1/Mec1 controlBMC Genomics 13Google Scholar
33.
1. Kim ST
2. Lim DS
3. Canman CE
4. Kastan MB
1999Substrate specificities and identification of putative substrates of ATM kinase family membersJ Biol Chem 274:37538–43Google Scholar
34.
1. Craven RJ
2. Greenwell PW
3. Dominska M
4. Petes TD
2002Regulation of genome stability by TEL1 and MEC1, yeast homologs of the mammalian ATM and ATR genesGenetics 161:493–507Google Scholar
35.
1. Menolfi D
2. Zha S. ATM
2020ATR and DNA-PKcs kinases-the lessons from the mouse models: inhibition not equal deletionCell Biosci 10Google Scholar
36.
1. Corcoles-Saez I
2. Dong K
3. Johnson AL
4. Waskiewicz E
5. Costanzo M
6. Boone C
7. et al.
2018Essential function of Mec1, the budding yeast ATM/ATR checkpoint-response kinase, in protein homeostasisDev Cell 46:495–503Google Scholar
37.
1. Corcoles-Saez I
2. Dong K
3. Cha RS
2019Versatility of the Mec1(ATM/ATR) signaling network in mediating resistance to replication, genotoxic, and proteotoxic stressesCurr Genet 65:657–61Google Scholar
38.
1. Lee H
2. Yuan C
3. Hammet A
4. Mahajan A
5. Chen ES
6. Wu MR
7. et al.
2008Diphosphothreonine-specific interaction between an SQ/TQ cluster and an FHA domain in the Rad53-Dun1 kinase cascadeMol Cell 30:767–78Google Scholar
39.
1. Carballo JA
2. Johnson AL
3. Sedgwick SG
4. Cha RS
2008Phosphorylation of the axial element protein Hop1 by Mec1/Tel1 ensures meiotic interhomolog recombinationCell 132:758–70Google Scholar
40.
1. Chuang CN
2. Cheng YH
3. Wang TF
2012Mek1 stabilizes Hop1-Thr318 phosphorylation to promote interhomolog recombination and checkpoint responses during yeast meiosisNucleic Acids Res 40:11416–27Google Scholar
41.
1. Zhao X
2. Muller EG
3. Rothstein R
1998A suppressor of two essential checkpoint genes identifies a novel protein that negatively affects dNTP poolsMol Cell 2:329–40Google Scholar
42.
1. Chavali S
2. Chavali PL
3. Chalancon G
4. de Groot NS
5. Gemayel R
6. Latysheva NS
7. et al.
2017Constraints and consequences of the emergence of amino acid repeats in eukaryotic proteinsNat Struct Mol Biol 24:765–77Google Scholar
43.
1. Mier P
2. Elena-Real C
3. Urbanek A
4. Bernado P
5. Andrade-Navarro MA
2020The importance of definitions in the study of polyQ regions: A tale of thresholds, impurities and sequence contextComput Struct Biotechnol J 18:306–13Google Scholar
44.
1. Lu X
2. Murphy RM
2014Synthesis and disaggregation of asparagine repeat-containing peptidesJ Pept Sci 20:860–7Google Scholar
45.
1. Zoghbi HY
2. Orr HT
2000Glutamine repeats and neurodegenerationAnnu Rev Neurosci 23:217–47Google Scholar
46.
1. Michelitsch MD
2. Weissman JS
2000A census of glutamine/asparagine-rich regions: implications for their conserved function and the prediction of novel prionsProc Natl Acad Sci U S A 97:11910–5Google Scholar
47.
1. Uptain SM
2. Lindquist S
2002Prions as protein-based genetic elementsAnnu Rev Microbiol 56:703–41Google Scholar
48.
1. Toombs JA
2. Liss NM
3. Cobble KR
4. Ben-Musa Z
5. Ross ED.
2011[PSI+] maintenance is dependent on the composition, not primary sequence, of the oligopeptide repeat domainPLoS One 6:e21953Google Scholar
49.
1. Wickner RB
2. Edskes HK
3. Roberts BT
4. Baxa U
5. Pierce MM
6. Ross ED
7. et al.
2004Prions: proteins as genes and infectious entitiesGenes Dev 18:470–85Google Scholar
50.
1. Wickner RB
1994[URE3] as an altered URE2 protein: evidence for a prion analog in Saccharomyces cerevisiaeScience 264:566–9Google Scholar
51.
1. Shewmaker F
2. Mull L
3. Nakayashiki T
4. Masison DC
5. Wickner RB
2007Ure2p function is enhanced by its prion domain in Saccharomyces cerevisiaeGenetics 176:1557–65Google Scholar
52.
1. Ngo S
2. Chiang V
3. Ho E
4. Le L
5. Guo Z
2012Prion domain of yeast Ure2 protein adopts a completely disordered structure: a solid-support EPR studyPLoS One 7:e47248Google Scholar
53.
1. Kasari V
2. Pochopien AA
3. Margus T
4. Murina V
5. Turnbull K
6. Zhou Y
7. et al.
2019A role for the Saccharomyces cerevisiae ABCF protein New1 in translation termination/recyclingNucleic Acids Res 47:8807–20Google Scholar
54.
1. Santoso A
2. Chien P
3. Osherovich LZ
4. Weissman JS
2000Molecular basis of a yeast prion species barrierCell 100:277–88Google Scholar
55.
1. Osherovich LZ
2. Weissman JS
2001Multiple Gln/Asn-rich prion domains confer susceptibility to induction of the yeast [PSI(+)] prionCell 106:183–94Google Scholar
56.
1. Chen B
2. Retzlaff M
3. Roos T
4. Frydman J
2011Cellular strategies of protein quality controlCold Spring Harb Perspect Biol 3:a004374Google Scholar
57.
1. Shorter J
2. Southworth DR
2019Spiraling in Control: Structures and Mechanisms of the Hsp104 DisaggregaseCold Spring Harb Perspect Biol 11Google Scholar
58.
1. Ye X
2. Lin J
3. Mayne L
4. Shorter J
5. Englander SW
2020Structural and kinetic basis for the regulation and potentiation of Hsp104 functionProc Natl Acad Sci U S A 117:9384–92Google Scholar
59.
1. Mullally JE
2. Chernova T
3. Wilkinson KD
2006Doa1 is a Cdc48 adapter that possesses a novel ubiquitin binding domainMol Cell Biol 26:822–30Google Scholar
60.
1. Zhao G
2. Li G
3. Schindelin H
4. Lennarz WJ
2009An Armadillo motif in Ufd3 interacts with Cdc48 and is involved in ubiquitin homeostasis and protein degradationProc Natl Acad Sci U S A 106:16197–202Google Scholar
61.
1. Tyedmers J
2. Madariaga ML
3. Lindquist S
2008Prion switching in response to environmental stressPLoS Biol 6:e294Google Scholar
62.
1. Swaminathan S
2. Amerik AY
3. Hochstrasser M
1999The Doa4 deubiquitinating enzyme is required for ubiquitin homeostasis in yeastMol Biol Cell 10:2583–94Google Scholar
63.
1. Dudley AM
2. Janse DM
3. Tanay A
4. Shamir R
5. Church GM
2005A global view of pleiotropy and phenotypically derived gene function in yeastMol Syst Biol 1:0001Google Scholar
64.
1. Dasgupta A
2. Ramsey KL
3. Smith JS
4. Auble DT
2004Sir Antagonist 1 (San1) is a ubiquitin ligaseJ Biol Chem 279:26830–8Google Scholar
65.
1. Fredrickson EK
2. Gallagher PS
3. Clowes Candadai SV
4. Gardner RG
2013Substrate recognition in nuclear protein quality control degradation is governed by exposed hydrophobicity that correlates with aggregation and insolubilityJ Biol Chem 288:6130–9Google Scholar
66.
1. Porat Z
2. Landau G
3. Bercovich Z
4. Krutauz D
5. Glickman M
6. Kahana C
2008Yeast antizyme mediates degradation of yeast ornithine decarboxylase by yeast but not by mammalian proteasome: new insights on yeast antizymeJ Biol Chem 283:4528–34Google Scholar
67.
1. Wright PE
2. Dyson HJ
1999Intrinsically unstructured proteins: re-assessing the protein structure-function paradigmJ Mol Biol 293:321–31Google Scholar
68.
1. Posey AE
2. Holehouse AS
3. Pappu RV
2018Phase separation of intrinsically disordered proteinsMethods Enzymol 611:1–30Google Scholar
69.
1. Ramazzotti M
2. Monsellier E
3. Kamoun C
4. Degl’Innocenti D
5. Melki R
2012Polyglutamine repeats are associated to specific sequence biases that are conserved among eukaryotesPLoS One 7:e30824Google Scholar
70.
1. Li C
2. Nagel J
3. Androulakis S
4. Song J
5. Buckle AM
2016PolyQ 2.0: an improved version of PolyQ, a database of human polyglutamine proteinsDatabase (Oxford Google Scholar
71.
1. Totzeck F
2. Andrade-Navarro MA
3. Mier P
2017The protein structure context of polyQ regionsPLoS One 12:e0170801Google Scholar
72.
1. Mier P
2. Andrade-Navarro MA
2021Between interactions and aggregates: The polyQ balanceGenome Biol Evol 13Google Scholar
73.
1. Seppey M
2. Manni M
3. Zdobnov EM
2019BUSCO: Assessing genome assembly and annotation completenessMethods Mol Biol 1962:227–45Google Scholar
74.
1. Cara L
2. Baitemirova M
3. Follis J
4. Larios-Sanz M
5. Ribes-Zamora A
2016The ATM- and ATR-related SCD domain is over-represented in proteins involved in nervous system developmentSci Rep 6Google Scholar
75.
1. Kuspa A
2. Loomis WF
2006The genome of Dictyostelium discoideumMethods Mol Biol 346:15–30Google Scholar
76.
1. Davies HM
2. Nofal SD
3. McLaughlin EJ
4. Osborne AR
2017Repetitive sequences in malaria parasite proteinsFEMS Microbiol Rev 41:923–40Google Scholar
77.
1. Mier P
2. Alanis-Lobato G
3. Andrade-Navarro MA
2017Context characterization of amino acid homorepeats using evolution, position, and orderProteins 85:709–19Google Scholar
78.
1. de Castro E
2. Sigrist CJ
3. Gattiker A
4. Bulliard V
5. Langendijk-Genevaux PS
6. Gasteiger E
7. et al.
2006ScanProsite: detection of PROSITE signature matches and ProRule-associated functional and structural residues in proteinsNucleic Acids Res 34Google Scholar
79.
1. Chen YK
2. Hsueh YP
2012Cortactin-binding protein 2 modulates the mobility of cortactin and regulates dendritic spine formation and maintenanceJ Neurosci 32:1043–55Google Scholar
80.
1. Hsueh YP
2012Neuron-specific regulation on F-actin cytoskeletons: The role of CTTNBP2 in dendritic spinogenesis and maintenanceCommun Integr Biol 5:334–6Google Scholar
81.
1. Meszaros B
2. Erdos G
3. Dosztanyi Z
2018IUPred2A: context-dependent prediction of protein disorder as a function of redox state and protein bindingNucleic Acids Res 46:W329–W37Google Scholar
82.
1. Shih PY
2. Fang YL
3. Shankar S
4. Lee SP
5. Hu HT
6. Chen H
7. et al.
2022Phase separation and zinc-induced transition modulate synaptic distribution and association of autism-linked CTTNBP2 and SHANK3Nat Commun 13:2664Google Scholar
83.
1. Curcio MJ
2. Lutz S
3. Lesage P
2015The Ty1 LTR-retrotransposon of budding yeast, Saccharomyces cerevisiaeMicrobiol Spectr 3:1–35Google Scholar
84.
1. Eichinger L
2. Pachebat JA
3. Glockner G
4. Rajandream MA
5. Sucgang R
6. Berriman M
7. et al.
2005The genome of the social amoeba Dictyostelium discoideumNature 435:43–57Google Scholar
85.
1. Martin-Gonzalez J
2. Montero-Bullon JF
3. Lacal J
2021Dictyostelium discoideum as a non-mammalian biomedical modelMicrob Biotechnol 14:111–25Google Scholar
86.
1. Ralph SA
2. Scheidig-Benatar C
3. Scherf A
2005Antigenic variation in Plasmodium falciparum is associated with movement of var loci between subnuclear locationsProc Natl Acad Sci U S A 102:5414–9Google Scholar
87.
1. Schwope RM
2. Chalker DL
2014Mutations in Pdd1 reveal distinct requirements for its chromodomain and chromoshadow domain in directing histone methylation and heterochromatin eliminationEukaryot Cell 13:190–201Google Scholar
88.
1. Mochizuki K
2. Gorovsky MA
2004Conjugation-specific small RNAs in Tetrahymena have predicted properties of scan (scn) RNAs involved in genome rearrangementGene Dev 18:2068–73Google Scholar
89.
1. Noto T
2. Kurth HM
3. Kataoka K
4. Aronica L
5. DeSouza LV
6. Siu KWM
7. et al.
2010The Tetrahymena Argonaute-binding protein Giw1p directs a mature Argonaute-siRNA complex to the nucleusCell 140:692–703Google Scholar
90.
1. Keeling PJ
2. Burki F
3. Wilcox HM
4. Allam B
5. Allen EE
6. Amaral-Zettler LA
7. et al.
2014The marine microbial eukaryote transcriptome sequencing project (MMETSP): illuminating the functional diversity of eukaryotic life in the oceans through transcriptome sequencingPLoS Biol 12:e1001889Google Scholar
91.
1. Johnson LK
2. Alexander H
3. Brown CT
2019Re-assembly, quality evaluation, and annotation of 678 microbial eukaryotic reference transcriptomesGigascience 8Google Scholar
92.
1. Schoelmerich MC
2. Sachdeva R
3. West-Roberts J
4. Waldburger L
5. Banfield JF
2023Tandem repeats in giant archaeal Borg elements undergo rapid evolution and create new intrinsically disordered regions in proteinsPLoS Biol 21:e3001980Google Scholar
93.
1. Lin FM
2. Lai YJ
3. Shen HJ
4. Cheng YH
5. Wang TF.
2010Yeast axial-element protein, Red1, binds SUMO chains to promote meiotic interhomologue recombination and chromosome synapsisEMBO J 29:586–96Google Scholar

Article and author information

Author information

Chi-Ning Chuang
Institute of Molecular Biology, Academia Sinica, Taipei 115, Taiwan
ORCID iD: 0000-0002-8252-8807
- Current affiliation: Department of Cell and Developmental Biology, University of Michigan Medical School, Ann Arbor, Michigan, United States of America
Hou-Cheng Liu
Institute of Molecular Biology, Academia Sinica, Taipei 115, Taiwan
ORCID iD: 0000-0002-7991-327X
- Current affiliation: Department of Cell and Developmental Biology, University of Michigan Medical School, Ann Arbor, Michigan, United States of America
Tai-Ting Woo
Institute of Molecular Biology, Academia Sinica, Taipei 115, Taiwan
ORCID iD: 0000-0002-9717-1142
- These two authors contributed equally to this work.
Ju-Lan Chao
Institute of Molecular Biology, Academia Sinica, Taipei 115, Taiwan
Chiung-Ya Chen
Institute of Molecular Biology, Academia Sinica, Taipei 115, Taiwan
ORCID iD: 0000-0003-0706-0535
Hisao-Tang Hu
Institute of Molecular Biology, Academia Sinica, Taipei 115, Taiwan
Yi-Ping Hsueh
Institute of Molecular Biology, Academia Sinica, Taipei 115, Taiwan, Department of Biochemical Science and Technology, National Chiayi University, Chiayi, Taiwan
ORCID iD: 0000-0002-0866-6275
Ting-Fang Wang
Institute of Molecular Biology, Academia Sinica, Taipei 115, Taiwan, Department of Biochemical Science and Technology, National Chiayi University, Chiayi, Taiwan
ORCID iD: 0000-0001-6306-9505
- corresponding author: Ting-Fang Wang Email: tfwang@gate.sinica.edu.tw

Version history

Preprint posted: July 26, 2023
Sent for peer review: August 4, 2023
Reviewed Preprint version 1: October 25, 2023
Reviewed Preprint version 2: January 26, 2024
Version of Record published: February 23, 2024

Cite all versions

You can cite all versions using the DOI https://doi.org/10.7554/eLife.91405. This DOI represents all versions, and will always resolve to the latest one.

Copyright

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

views: 1,440
downloads: 85
citation: 1

Views, downloads and citations are aggregated across all versions of this paper published by eLife.

Significance of findings

Strength of evidence

Abstract

Introduction

Plasmids used in this study

Results

SCDs provide versatile functionalities in proteins

The Q-rich domains of seven different yeast proteins possess autonomous expression-enhancing (PEE) activities.

The Q-rich motifs of three yeast prion-causing proteins also exhibit PEE activities

The PEE function is not affected by the quaternary structures of target proteins

S. cerevisiae strains used in this study

The autonomous PEE function is not likely controlled by plasmid copy number or its transcription

The autonomous protein-expression-enhancing function of Rad51-NTD is unlikely to be controlled during transcription or simply arise from plasmid copy number differences.

The oligonucleotide primers used for g-qPCR and RT-qPCR

The protein quality control system moderately regulates autonomous PEE activities

The expression-promoting function of Rad51-NTD is controlled during protein translation and does not affect ubiquitin-mediated protein degradation.

The relationship between PEE function, amino acid contents and structural flexibility

Relative

Comparative GO enrichment analyses of SCDs and polyX motifs in different eukaryotes

Percentages of proteins with different numbers of SCDs, and polyQ, polyQ/N or polyN tracts in 37 different eukaryotes.

Usage frequency (%) of standard codons [stop codon (*), Q, C, Y and W] and reassigned stop codons (→ Q, → C or → W) in 37 different eukaryotes

The numbers of SCD proteins involved in different biological processes of six representative eukaryotes

The numbers of polyQ/N proteins involved in different biological processes of six representative eukaryotes

Massive expansion of Q-rich motifs in ciliates having two reassigned stop codons (UAAQ and UAGQ)

The numbers of polyQ proteins of different biological processes in six ciliate species

Proteome-wide contents of 20 different amino acids in 37 different eukaryotes.

Percentages of proteins with indicated polyX motifs in 37 different eukaryotes.

Discussion

Conclusions

Methods

Supporting information

Supporting Datasets

Declarations

Ethics approval and consent to participate

Consent for publication

Availability data and materials

Competing interests

Funding

Authors’ contributions

Acknowledgements

Authors’ information

References

Article and author information

Author information

Chi-Ning Chuang‡

Hou-Cheng Liu‡

Tai-Ting Woo#

Ju-Lan Chao

Chiung-Ya Chen

Hisao-Tang Hu

Yi-Ping Hsueh

Ting-Fang Wang

Version history

Cite all versions

Copyright

Metrics

Massive expansion of Q-rich motifs in ciliates having two reassigned stop codons (UAA^Q and UAG^Q)

Chi-Ning Chuang

Hou-Cheng Liu

Tai-Ting Woo