Noncanonical usage of stop codons in ciliates expands proteins with structurally flexible Q-rich motifs

  1. Chi-Ning Chuang
  2. Hou-Cheng Liu
  3. Tai-Ting Woo
  4. Ju-Lan Chao
  5. Chiung-Ya Chen
  6. Hisao-Tang Hu
  7. Yi-Ping Hsueh
  8. Ting-Fang Wang  Is a corresponding author
  1. Institute of Molecular Biology, Academia Sinica, Taiwan
  2. Department of Biochemical Science and Technology, National Chiayi University, Taiwan
12 figures, 4 tables and 3 additional files

Figures

The Q-rich domains of seven different yeast proteins possess autonomous expression-enhancing (PEE) activities.

(A–B) N-terminal fusion of Rad51-NTD/SCD, Rad53-SCD1, Hop1-SCD, Sml1-NTD, Sup35-PND, Ure2-UPD and New1-NPD promotes high-level expression of LacZ-NVH, respectively. The NVH tag contains an SV40 nuclear localization signal (NLS) peptide preceding a V5 epitope tag and a hexahistidine (His6) affinity tag (Woo et al., 2020). Western blots for visualization of LacZ-NVH fusion proteins (A) and quantitative β-galactosidase assays (B) were carried out as described previously (Woo et al., 2020). Error bars indicate standard deviation between experiments (n≥3). Asterisks indicate significant differences relative to wild type (WT) in A or lacking an NTD in B, with p values calculated using a two-tailed t-test (***, p-value <0.001; **, p-value <0.01). (C–D) The PEE activities of S/T/Q/N-rich domains are independent of the quaternary structures of target proteins. (C) Rad53-SCD1 can be used as an N-terminal fusion tag to enhance production of four different target proteins: LacZ-NVH, GST-NVH, GSTnd-NVH, and GFP-NVH. (D) Visualization of native Rad51 (NTD-Rad51-ΔN), Rad51-ΔN, and the Rad51-ΔN fusion proteins by immunoblotting. Hsp104 was used as a loading control. Size in kilodaltons of standard protein markers is labeled to the left of the blots. The black arrowhead indicates the protein band of Rad51-ΔN. (E) MMS sensitivity. Spot assay showing fivefold serial dilutions of indicated strains grown on YPD plates with or without MMS at the indicated concentrations (w/v).

The autonomous protein-expression-enhancing function of Rad51-NTD is unlikely to be controlled during transcription or simply arise from plasmid copy number differences.

The effects of WT and mutant Rad51-NTD on β-galactosidase activities (A), plasmid DNA copy numbers (B), relative steady-state levels of LacZ-NVH mRNA normalized to ACT1 (actin) mRNA (C), and relative ratios of LacZ-NVH mRNA versus plasmid DNA copy number (D). The wild-type yeast cells were transformed with indicated CEN-ARS plasmids, respectively, to express WT and mutant Rad51-NTD-LacZ-NVH fusion proteins or LacZ-NVH alone under the control of the native RAD51 gene promoter (PRAD51). The relative quantification (RQ = 2-ΔΔϹT) values were determined to reveal the plasmid DNA copy number and steady-state levels of LacZ-NVH mRNA by g-qPCR and RT-qPCR, respectively. LacZ and ACT1 were selected as target and reference protein-encoding genes, respectively, in both g-qPCR and RT-qPCR. The data shown represent mean ± SD from three independent biological data-points.

The expression-promoting function of Rad51-NTD is controlled during protein translation and does not affect ubiquitin-mediated protein degradation.

(A) The steady-state protein levels of Rad51-NTD-LacZ-NVH and LacZ-NVH in WT and six protein homeostasis gene knockout mutants. (B–D) The impact of six protein homeostasis genes on the β-galactosidase activity ratios of Rad51-NTD-LacZ-NVH to LacZ-NVH in WT and the six gene knockout mutants (B). The β-galactosidase activities of LacZ-NVH (C) and Rad51-NTD-LacZ-NVH (D) in WT and the six gene knockout mutants are shown. Asterisks indicate significant differences, with values calculated using a two-tailed t-test (***, p-value <0.001; **, p-value <0.01; *, p-value <0.05).

Relative β-galactosidase (LacZ) activities are correlated with the percentage STQ or STQN amino acid content of three Q-rich motifs.

(A) List of N-terminal tags with their respective length, numbers of S/T/Q/N amino acids, overall STQ or STQN percentages, and relative β-galactosidase activities. (B–D) Linear regressions between relative β-galactosidase activities and overall STQ or STQN percentages for Rad51-NTD (B), Rad53-SCD1 (C) and Sup35-PND (D). The coefficients of determination (R2) are indicated for each simple linear regression. (E) The amino acid sequences of wild-type and mutant Rad51-NTD, Rad51-SCD1 and Sup35-PND, respectively. Error bars are too small to be included.

Alanine scanning mutagenesis of intrinsically disordered regions (IDRs).

The amino acid sequences of WT and mutant IDRs are listed in Supplementary file 1e. Total protein lysates prepared from yeast cells expressing Rad51-NTD-LacZ-NVH (A), Sup35-PND-LacZ-NVH (B) or Rad53-SCD1-LacZ-NVH (C) were visualized by immunoblotting with anti-V5 antisera. Hsp104 was used as a loading control. Quantitative yeast β-galactosidase (LacZ) assays were carried out as described in Figure 1. Error bars indicate standard deviation between experiments (n=3). Asterisks indicate significant differences when compared to LacZ-NVH, with p values calculated using a two-tailed t-test (**, p-value <0.01 and ***, p-value <0.001).

Figure 6 with 3 supplements
Percentages of proteins with different numbers of SCDs, and polyQ, polyQ/N or polyN tracts in 37 different eukaryotes.
Figure 6—source data 1

The average usages of 20 different amino acids in 17 ciliate and 20 non-ciliate species.

https://cdn.elifesciences.org/articles/91405/elife-91405-fig6-data1-v1.xlsx
Figure 6—source data 2

The number of proteins containing different types of polyQ, polyQ/N and polyN tracts in 17 ciliate and 20 non-ciliate species.

https://cdn.elifesciences.org/articles/91405/elife-91405-fig6-data2-v1.xlsx
Figure 6—source data 3

The numbers and percentages of SCD and polyX proteins in 17 ciliate and 20 non-ciliate species.

https://cdn.elifesciences.org/articles/91405/elife-91405-fig6-data3-v1.xlsx
Figure 6—source data 4

The ratios of the overall number of X residues for each of the seven polyX motifs relative to those in the entire proteome of each species, respectively.

https://cdn.elifesciences.org/articles/91405/elife-91405-fig6-data4-v1.xlsx
Figure 6—source data 5

The codon usage frequency in 26 near-complete proteomes and 11 ciliate proteomes encoded by the transcripts generated as part of the Marine Microbial Eukaryote Transcriptome Sequencing Project (MMETSP).

https://cdn.elifesciences.org/articles/91405/elife-91405-fig6-data5-v1.xlsx
Figure 6—source data 6

GO enrichment analyses revealing the SCD and polyX proteins involved in different biological processes in 6 ciliate and 20 non-ciliate species.

The percentages and numbers of SCD and polyX proteins in our search that belong to each indicated Gene Ontology (GO) group are shown. GOfuncR (Huttenhower et al., 2009) was applied for GO enrichment and statistical analysis. The p values adjusted according to the Family-wise error rate (FWER) are shown. The overrepresented GO groups (adjusted p-values ≤0.001) are highlighted in red font.

https://cdn.elifesciences.org/articles/91405/elife-91405-fig6-data6-v1.xlsx
Figure 6—source data 7

GO enrichment analyses revealing the SCD and polyX proteins involved in different biological processes in 6 ciliate and 20 non-ciliate species.

The percentages and numbers of SCD and polyX proteins in our search that belong to each indicated Gene Ontology (GO) group are shown. GOfuncR (Huttenhower et al., 2009) was applied for GO enrichment and statistical analysis. The p values adjusted according to the Family-wise error rate (FWER) are shown. The overrepresented GO groups (adjusted p-values ≤0.001) are highlighted in red font.

https://cdn.elifesciences.org/articles/91405/elife-91405-fig6-data7-v1.xlsx
Figure 6—source data 8

GO enrichment analyses revealing the SCD and polyX proteins involved in different biological processes in 6 ciliate and 20 non-ciliate species.

The percentages and numbers of SCD and polyX proteins in our search that belong to each indicated Gene Ontology (GO) group are shown. GOfuncR (Huttenhower et al., 2009) was applied for GO enrichment and statistical analysis. The p values adjusted according to the Family-wise error rate (FWER) are shown. The overrepresented GO groups (adjusted p-values ≤0.001) are highlighted in red font.

https://cdn.elifesciences.org/articles/91405/elife-91405-fig6-data8-v1.xlsx
Figure 6—source data 9

GO enrichment analyses revealing the SCD and polyX proteins involved in different biological processes in 6 ciliate and 20 non-ciliate species.

The percentages and numbers of SCD and polyX proteins in our search that belong to each indicated Gene Ontology (GO) group are shown. GOfuncR (Huttenhower et al., 2009) was applied for GO enrichment and statistical analysis. The p values adjusted according to the Family-wise error rate (FWER) are shown. The overrepresented GO groups (adjusted p-values ≤0.001) are highlighted in red font.

https://cdn.elifesciences.org/articles/91405/elife-91405-fig6-data9-v1.xlsx
Figure 6—source data 10

GO enrichment analyses revealing the SCD and polyX proteins involved in different biological processes in 6 ciliate and 20 non-ciliate species.

The percentages and numbers of SCD and polyX proteins in our search that belong to each indicated Gene Ontology (GO) group are shown. GOfuncR (Huttenhower et al., 2009) was applied for GO enrichment and statistical analysis. The p values adjusted according to the Family-wise error rate (FWER) are shown. The overrepresented GO groups (adjusted p-values ≤0.001) are highlighted in red font.

https://cdn.elifesciences.org/articles/91405/elife-91405-fig6-data10-v1.xlsx
Figure 6—source data 11

GO enrichment analyses revealing the SCD and polyX proteins involved in different biological processes in 6 ciliate and 20 non-ciliate species.

The percentages and numbers of SCD and polyX proteins in our search that belong to each indicated Gene Ontology (GO) group are shown. GOfuncR (Huttenhower et al., 2009) was applied for GO enrichment and statistical analysis. The p values adjusted according to the Family-wise error rate (FWER) are shown. The overrepresented GO groups (adjusted p-values ≤0.001) are highlighted in red font.

https://cdn.elifesciences.org/articles/91405/elife-91405-fig6-data11-v1.xlsx
Figure 6—source data 12

GO enrichment analyses revealing the SCD and polyX proteins involved in different biological processes in 6 ciliate and 20 non-ciliate species.

The percentages and numbers of SCD and polyX proteins in our search that belong to each indicated Gene Ontology (GO) group are shown. GOfuncR (Huttenhower et al., 2009) was applied for GO enrichment and statistical analysis. The p values adjusted according to the Family-wise error rate (FWER) are shown. The overrepresented GO groups (adjusted p-values ≤0.001) are highlighted in red font.

https://cdn.elifesciences.org/articles/91405/elife-91405-fig6-data12-v1.xlsx
Figure 6—source data 13

GO enrichment analyses revealing the SCD and polyX proteins involved in different biological processes in 6 ciliate and 20 non-ciliate species.

The percentages and numbers of SCD and polyX proteins in our search that belong to each indicated Gene Ontology (GO) group are shown. GOfuncR (Huttenhower et al., 2009) was applied for GO enrichment and statistical analysis. The p values adjusted according to the Family-wise error rate (FWER) are shown. The overrepresented GO groups (adjusted p-values ≤0.001) are highlighted in red font.

https://cdn.elifesciences.org/articles/91405/elife-91405-fig6-data13-v1.xlsx
Figure 6—source data 14

GO enrichment analyses revealing the SCD and polyX proteins involved in different biological processes in 6 ciliate and 20 non-ciliate species.

The percentages and numbers of SCD and polyX proteins in our search that belong to each indicated Gene Ontology (GO) group are shown. GOfuncR (Huttenhower et al., 2009) was applied for GO enrichment and statistical analysis. The p values adjusted according to the Family-wise error rate (FWER) are shown. The overrepresented GO groups (adjusted p-values ≤0.001) are highlighted in red font.

https://cdn.elifesciences.org/articles/91405/elife-91405-fig6-data14-v1.xlsx
Figure 6—source data 15

GO enrichment analyses revealing the SCD and polyX proteins involved in different biological processes in 6 ciliate and 20 non-ciliate species.

The percentages and numbers of SCD and polyX proteins in our search that belong to each indicated Gene Ontology (GO) group are shown. GOfuncR (Huttenhower et al., 2009) was applied for GO enrichment and statistical analysis. The p values adjusted according to the Family-wise error rate (FWER) are shown. The overrepresented GO groups (adjusted p-values ≤0.001) are highlighted in red font.

https://cdn.elifesciences.org/articles/91405/elife-91405-fig6-data15-v1.xlsx
Figure 6—source data 16

GO enrichment analyses revealing the SCD and polyX proteins involved in different biological processes in 6 ciliate and 20 non-ciliate species.

The percentages and numbers of SCD and polyX proteins in our search that belong to each indicated Gene Ontology (GO) group are shown. GOfuncR (Huttenhower et al., 2009) was applied for GO enrichment and statistical analysis. The p values adjusted according to the Family-wise error rate (FWER) are shown. The overrepresented GO groups (adjusted p-values ≤0.001) are highlighted in red font.

https://cdn.elifesciences.org/articles/91405/elife-91405-fig6-data16-v1.xlsx
Figure 6—source data 17

GO enrichment analyses revealing the SCD and polyX proteins involved in different biological processes in 6 ciliate and 20 non-ciliate species.

The percentages and numbers of SCD and polyX proteins in our search that belong to each indicated Gene Ontology (GO) group are shown. GOfuncR (Huttenhower et al., 2009) was applied for GO enrichment and statistical analysis. The p values adjusted according to the Family-wise error rate (FWER) are shown. The overrepresented GO groups (adjusted p-values ≤0.001) are highlighted in red font.

https://cdn.elifesciences.org/articles/91405/elife-91405-fig6-data17-v1.xlsx
Figure 6—source data 18

GO enrichment analyses revealing the SCD and polyX proteins involved in different biological processes in 6 ciliate and 20 non-ciliate species.

The percentages and numbers of SCD and polyX proteins in our search that belong to each indicated Gene Ontology (GO) group are shown. GOfuncR (Huttenhower et al., 2009) was applied for GO enrichment and statistical analysis. The p values adjusted according to the Family-wise error rate (FWER) are shown. The overrepresented GO groups (adjusted p-values ≤0.001) are highlighted in red font.

https://cdn.elifesciences.org/articles/91405/elife-91405-fig6-data18-v1.xlsx
Figure 6—source data 19

GO enrichment analyses revealing the SCD and polyX proteins involved in different biological processes in 6 ciliate and 20 non-ciliate species.

The percentages and numbers of SCD and polyX proteins in our search that belong to each indicated Gene Ontology (GO) group are shown. GOfuncR (Huttenhower et al., 2009) was applied for GO enrichment and statistical analysis. The p values adjusted according to the Family-wise error rate (FWER) are shown. The overrepresented GO groups (adjusted p-values ≤0.001) are highlighted in red font.

https://cdn.elifesciences.org/articles/91405/elife-91405-fig6-data19-v1.xlsx
Figure 6—source data 20

GO enrichment analyses revealing the SCD and polyX proteins involved in different biological processes in 6 ciliate and 20 non-ciliate species.

The percentages and numbers of SCD and polyX proteins in our search that belong to each indicated Gene Ontology (GO) group are shown. GOfuncR (Huttenhower et al., 2009) was applied for GO enrichment and statistical analysis. The p values adjusted according to the Family-wise error rate (FWER) are shown. The overrepresented GO groups (adjusted p-values ≤0.001) are highlighted in red font.

https://cdn.elifesciences.org/articles/91405/elife-91405-fig6-data20-v1.xlsx
Figure 6—source data 21

GO enrichment analyses revealing the SCD and polyX proteins involved in different biological processes in 6 ciliate and 20 non-ciliate species.

The percentages and numbers of SCD and polyX proteins in our search that belong to each indicated Gene Ontology (GO) group are shown. GOfuncR (Huttenhower et al., 2009) was applied for GO enrichment and statistical analysis. The p values adjusted according to the Family-wise error rate (FWER) are shown. The overrepresented GO groups (adjusted p-values ≤0.001) are highlighted in red font.

https://cdn.elifesciences.org/articles/91405/elife-91405-fig6-data21-v1.xlsx
Figure 6—source data 22

GO enrichment analyses revealing the SCD and polyX proteins involved in different biological processes in 6 ciliate and 20 non-ciliate species.

The percentages and numbers of SCD and polyX proteins in our search that belong to each indicated Gene Ontology (GO) group are shown. GOfuncR (Huttenhower et al., 2009) was applied for GO enrichment and statistical analysis. The p values adjusted according to the Family-wise error rate (FWER) are shown. The overrepresented GO groups (adjusted p-values ≤0.001) are highlighted in red font.

https://cdn.elifesciences.org/articles/91405/elife-91405-fig6-data22-v1.xlsx
Figure 6—source data 23

GO enrichment analyses revealing the SCD and polyX proteins involved in different biological processes in 6 ciliate and 20 non-ciliate species.

The percentages and numbers of SCD and polyX proteins in our search that belong to each indicated Gene Ontology (GO) group are shown. GOfuncR (Huttenhower et al., 2009) was applied for GO enrichment and statistical analysis. The p values adjusted according to the Family-wise error rate (FWER) are shown. The overrepresented GO groups (adjusted p-values ≤0.001) are highlighted in red font.

https://cdn.elifesciences.org/articles/91405/elife-91405-fig6-data23-v1.xlsx
Figure 6—source data 24

GO enrichment analyses revealing the SCD and polyX proteins involved in different biological processes in 6 ciliate and 20 non-ciliate species.

The percentages and numbers of SCD and polyX proteins in our search that belong to each indicated Gene Ontology (GO) group are shown. GOfuncR (Huttenhower et al., 2009) was applied for GO enrichment and statistical analysis. The p values adjusted according to the Family-wise error rate (FWER) are shown. The overrepresented GO groups (adjusted p-values ≤0.001) are highlighted in red font.

https://cdn.elifesciences.org/articles/91405/elife-91405-fig6-data24-v1.xlsx
Figure 6—source data 25

GO enrichment analyses revealing the SCD and polyX proteins involved in different biological processes in 6 ciliate and 20 non-ciliate species.

The percentages and numbers of SCD and polyX proteins in our search that belong to each indicated Gene Ontology (GO) group are shown. GOfuncR (Huttenhower et al., 2009) was applied for GO enrichment and statistical analysis. The p values adjusted according to the Family-wise error rate (FWER) are shown. The overrepresented GO groups (adjusted p-values ≤0.001) are highlighted in red font.

https://cdn.elifesciences.org/articles/91405/elife-91405-fig6-data25-v1.xlsx
Figure 6—source data 26

GO enrichment analyses revealing the SCD and polyX proteins involved in different biological processes in 6 ciliate and 20 non-ciliate species.

The percentages and numbers of SCD and polyX proteins in our search that belong to each indicated Gene Ontology (GO) group are shown. GOfuncR (Huttenhower et al., 2009) was applied for GO enrichment and statistical analysis. The p values adjusted according to the Family-wise error rate (FWER) are shown. The overrepresented GO groups (adjusted p-values ≤0.001) are highlighted in red font.

https://cdn.elifesciences.org/articles/91405/elife-91405-fig6-data26-v1.xlsx
Figure 6—source data 27

GO enrichment analyses revealing the SCD and polyX proteins involved in different biological processes in 6 ciliate and 20 non-ciliate species.

The percentages and numbers of SCD and polyX proteins in our search that belong to each indicated Gene Ontology (GO) group are shown. GOfuncR (Huttenhower et al., 2009) was applied for GO enrichment and statistical analysis. The p values adjusted according to the Family-wise error rate (FWER) are shown. The overrepresented GO groups (adjusted p-values ≤0.001) are highlighted in red font.

https://cdn.elifesciences.org/articles/91405/elife-91405-fig6-data27-v1.xlsx
Figure 6—source data 28

GO enrichment analyses revealing the SCD and polyX proteins involved in different biological processes in 6 ciliate and 20 non-ciliate species.

The percentages and numbers of SCD and polyX proteins in our search that belong to each indicated Gene Ontology (GO) group are shown. GOfuncR (Huttenhower et al., 2009) was applied for GO enrichment and statistical analysis. The p values adjusted according to the Family-wise error rate (FWER) are shown. The overrepresented GO groups (adjusted p-values ≤0.001) are highlighted in red font.

https://cdn.elifesciences.org/articles/91405/elife-91405-fig6-data28-v1.xlsx
Figure 6—source data 29

GO enrichment analyses revealing the SCD and polyX proteins involved in different biological processes in 6 ciliate and 20 non-ciliate species.

The percentages and numbers of SCD and polyX proteins in our search that belong to each indicated Gene Ontology (GO) group are shown. GOfuncR (Huttenhower et al., 2009) was applied for GO enrichment and statistical analysis. The p values adjusted according to the Family-wise error rate (FWER) are shown. The overrepresented GO groups (adjusted p-values ≤0.001) are highlighted in red font.

https://cdn.elifesciences.org/articles/91405/elife-91405-fig6-data29-v1.xlsx
Figure 6—source data 30

GO enrichment analyses revealing the SCD and polyX proteins involved in different biological processes in 6 ciliate and 20 non-ciliate species.

The percentages and numbers of SCD and polyX proteins in our search that belong to each indicated Gene Ontology (GO) group are shown. GOfuncR (Huttenhower et al., 2009) was applied for GO enrichment and statistical analysis. The p values adjusted according to the Family-wise error rate (FWER) are shown. The overrepresented GO groups (adjusted p-values ≤0.001) are highlighted in red font.

https://cdn.elifesciences.org/articles/91405/elife-91405-fig6-data30-v1.xlsx
Figure 6—source data 31

GO enrichment analyses revealing the SCD and polyX proteins involved in different biological processes in 6 ciliate and 20 non-ciliate species.

The percentages and numbers of SCD and polyX proteins in our search that belong to each indicated Gene Ontology (GO) group are shown. GOfuncR (Huttenhower et al., 2009) was applied for GO enrichment and statistical analysis. The p values adjusted according to the Family-wise error rate (FWER) are shown. The overrepresented GO groups (adjusted p-values ≤0.001) are highlighted in red font.

https://cdn.elifesciences.org/articles/91405/elife-91405-fig6-data31-v1.xlsx
Figure 6—source data 32

The results of BLASTP searches using the 58 Tetrahymena thermophila proteins involved in xylan catabolysis.

https://cdn.elifesciences.org/articles/91405/elife-91405-fig6-data32-v1.xlsx
Figure 6—source data 33

The list of 124 Tetrahymena thermophila proteins involved in meiosis (kindly provided by Josef Loidl).

The numbers of SCD and polyX tracts in each protein are indicated.

https://cdn.elifesciences.org/articles/91405/elife-91405-fig6-data33-v1.xlsx
Figure 6—figure supplement 1
Proteome-wide contents of 20 different amino acids in 37 different eukaryotes.
Figure 6—figure supplement 2
Percentages of proteins with indicated polyQ and polyQ/N tracts in 37 different eukaryotes.
Figure 6—figure supplement 3
Percentages of proteins with indicated polyX motifs in 37 different eukaryotes.
Q contents in 7 different types of polyQ motifs in 26 near-complete proteomes.

The five ciliates with reassigned stops codon (TAAQ and TAGQ) are indicated in red. Stentor coeruleus, a ciliate with standard stop codons, is indicated in green.

N contents in 7 different types of polyN motifs in 26 near-complete proteomes.

The five ciliates with reassigned stops codon (TAAQ and TAGQ) are indicated in red. Stentor coeruleus, a ciliate with standard stop codons, is indicated in green.

contents in 7 different types of polyS motifs in 26 near-complete proteomes.

The five ciliates with reassigned stops codon (TAAQ and TAGQ) are indicated in red. Stentor coeruleus, a ciliate with standard stop codons, is indicated in green.

T contents in 7 different types of polyT motifs in 26 near-complete proteomes.

The five ciliates with reassigned stops codon (TAAQ and TAGQ) are indicated in red. Stentor coeruleus, a ciliate with standard stop codons, is indicated in green.

Selection of biological processes with overrepresented SCD-containing proteins in different eukaryotes.

The percentages and number of SCD-containing proteins in our search that belong to each indicated Gene Ontology (GO) group are shown. GOfuncR (Huttenhower et al., 2009) was applied for GO enrichment and statistical analysis. The p values adjusted according to the Family-wise error rate (FWER) are shown.

Selection of biological processes with overrepresented polyQ-containing proteins in different eukaryotes.

The percentages and numbers of polyQ-containing proteins in our search that belong to each indicated Gene Ontology (GO) group are shown. GOfuncR (Huttenhower et al., 2009) was applied for GO enrichment and statistical analysis. The p values adjusted according to the Family-wise error rate (FWER) are shown. The five ciliates with reassigned stops codons (TAAQ and TAGQ) are indicated in red. Stentor coeruleus, a ciliate with standard stop codons, is indicated in green.

Tables

Table 1
Usage frequency (%) of standard codons [stop codon (*), Q, C, Y and W] and reassigned stop codons (→ Q, → C or → W) in 37 different eukaryotes.
SpeciesSourceIDBUSCOProtein (%)Protein#TAATAGTGACAACAGTGCTGTTACTATTGG
NCBI genetic code: 1Non-ciliate eukaryotes***QQCCYYW
Saccharomyces cerevisiaeUniProtUP00000231199.660620.160.080.012.770.890.631.030.863.100.93
Candida albicansUniProtUP00000055998.860350.100.050.033.570.650.180.941.042.541.09
Candida aurisUniProtUP00023024997.454090.080.060.061.812.120.550.592.091.161.07
Candida tropicalisUniProtUP00000203794.662260.100.070.033.610.660.140.960.952.620.98
Neurospora crassaUniProtUP00000180599.2102570.060.050.081.702.600.770.341.750.851.31
Magnaporthe oryzaeUniProtUP00000905898.6127940.060.070.101.372.690.920.351.800.711.42
Trichoderma reeseiPMID: 34908505PRJNA38202099.2137350.060.060.111.172.950.950.321.800.831.42
Cryptococcus neoformansUniProtUP00000214999.567430.070.060.052.061.790.480.551.391.141.37
Ustilago maydisUniProtUP00000056199.468060.040.050.071.822.610.720.351.590.651.18
Taiwanofungus camphoratusPMID: 35196809PRJNA61529594.6140190.050.060.111.572.190.700.571.381.221.36
Dictyostelium discoideumUniProtUP00000219593.7127340.160.010.014.860.190.151.270.523.020.73
Plasmodium falciparumUniProtUP00000145099.153760.090.010.032.420.370.231.520.615.050.49
Drosophila melanogasterUniProtUP000000803100220880.080.070.051.563.611.320.541.841.080.99
Aedes aegyptiUniProtUP00000882099.4189980.110.070.081.762.581.110.792.161.141.06
Caenorhabditis elegansUniProtUP000001940100265480.160.060.142.741.440.911.121.371.751.11
Danio rerioUniProtUP00000043795.5468440.110.060.141.183.351.121.131.701.261.16
Mus musculusUniProtUP00000058999.7553410.100.080.161.203.411.231.141.611.221.25
Homo sapiensUniProtUP00000564099.5790380.100.080.161.233.421.261.061.531.221.32
Arabidopsis thalianaUniProtUP000006548100393340.090.050.121.941.520.721.051.371.461.25
Chlamydomonas reinhardtiiUniProtUP00000690698.9188290.030.040.060.594.051.10.221.450.241.16
NCBI genetic code: 6group I ciliates→ Q→ Q*QQCCYYW
Tetrahymena thermophilaUniProtUP00000916898.9269725.461.630.162.040.480.790.991.223.090.51
Paramecium tetraureliaUniProtUP00000060098.8394614.531.480.222.540.570.611.211.123.140.76
Oxytricha trifallaxUniProtUP00000607797.1235593.631.570.152.681.070.590.561.442.270.58
Stylonychia lemnaeUniProtUP00003986597.1207203.221.810.172.261.050.620.551.312.490.62
Pseudocohnilembus persalinusUniProtUP00005493792.4131757.361.390.181.760.370.321.001.003.260.61
NCBI genetic code: 6group II ciliates→ Q→ Q*QQCCYYW
AristerostomaMMETSPMMETSP012562.5278680.961.040.152.650.970.710.681.352.490.8
Favella ehrenbergiiMMETSPMMETSP012385.4264770.721.510.161.883.061.110.252.060.710.83
PseudokeronopsisMMETSPMMETSP0211
MMETSP1396
87.2625741.041.370.162.052.580.940.442.181.400.78
Strombidium inclinatumMMETSPMMETSP020883.6322100.641.280.111.633.500.830.242.120.690.7
Uronema spp.MMETSPMMETSP001852.6138876.900.660.170.800.080.281.630.803.620.87
NCBI genetic code: 1group III ciliates***QQCCYYW
Stentor coeruleusUniProtUP00018720992.4309690.160.080.012.770.890.631.030.863.10.93
Climacostomum virensMMETSPMMETSP139794.7338990.110.090.041.792.201.380.602.600.851.06
Litonotus pictusMMETSPMMETSP020965.5302220.080.030.012.121.520.630.771.832.250.54
Protocruzia adherensMMETSPMMETSP021674.9405770.070.040.042.911.240.690.941.301.831.00
NCBI genetic code: 10group IV ciliate**→ CQQCCYYW
Euplotes focardiiMMETSPMMETSP0205
MMETSP0206
60.8366590.230.060.512.431.230.490.841.282.380.87
NCBI genetic code: 4group IV ciliate**→ WQQCCYYW
Blepharisma japonicumMMETSPMMETSP139581.9227140.130.030.302.851.240.940.800.942.720.84
NCBI genetic code: 29group IV ciliate→ Y→ Y*QQCCYYW
Mesodinium pulexMMETSPMMETSP046788.9610580.290.560.130.773.331.530.251.780.341.29
Table 2
Usage frequencies of TAA*, TAG*, TAAQ, TAGQ, CAAQ, and CAGQ codons in the entire proteomes of 26 different organisms.
SpeciesCAACAGTAATAG
Saccharomyces cerevisiae S288c2.73 (62.6%Q)1.21 (37.4%Q)0.110.05
Candida albicans3.57 (84.6%Q)0.65 (15.4%Q)0.10.05
Candida auris1.81 (46.1%Q)2.12 (53.9%Q)0.080.06
Candida tropicalis3.61 (84.5%Q)0.66 (15.5%Q)0.10.07
Neurospora crassa1.70 (39.5%Q)2.60 (60.5%Q)0.060.05
Magnaporthe oryzae1.37 (33.7%Q)2.69 (66.3%Q)0.060.07
Trichoderma reesei1.17 (28.4%Q)2.95 (71.6%Q)0.060.06
Cryptococcus neoformans2.06 (53.5%Q)1.79 (46.5%Q)0.070.06
Ustilago maydis1.82 (41.3%Q)2.61 (58.7%Q)0.040.05
Taiwanofungus camphoratus1.57 (41.8%Q)2.19 (58.2%Q)0.050.06
Dictyostelium discoideum4.86 (96.2%Q)0.19 (3.8%Q)0.160.01
Plasmodium falciparum2.42 (86.7%Q)0.37 (13.3%Q)0.090.01
Drosophila melanogaster1.56 (13.4%Q)3.61 (86.6%Q)0.080.07
Aedes aegypti1.76 (40.6%Q)2.58 (59.4%Q)0.110.07
Caenorhabditis elegans2.74 (65.6%Q)1.44 (34.4%Q)0.160.06
Danio rerio1.18 (26.0%Q)3.35 (74.0%Q)0.110.06
Mus musculus1.20 (26.0%Q)3.41 (74.0%Q)0.10.08
Homo sapiens1.23 (26.5%Q)3.42 (73.5%Q)0.10.08
Arabidopsis thaliana1.94 (56.1%Q)1.52 (43.9%Q)0.090.05
Chlamydomonas reinhardtii0.59 (12.7%Q)4.05 (87.3%Q)0.030.04
Tetrahymena thermophila2.04 (21.2%Q)0.48 (5.0%Q)5.46 (56.8%Q)1.63 (17.0%Q)
Paramecium tetraurelia2.54 (27.9%Q)0.57 (6.3%Q)4.53 (46.7%Q)1.48 (16.2%Q)
Oxytricha trifallax2.68 (29.9%Q)1.07 (12.0%Q)3.63 (40.6%Q)1.57 (17.5%Q)
Stylonychia lemnae2.26 (21.1%Q)1.05 (12.6%Q)3.22 (38.6%Q)1.81 (21.7%Q)
Pseudocohnilembus persalinus1.76 (18.0%Q)0.37 (3.8%Q)7.36 (76.0%Q)1.39 (14.4%Q)
Stentor coeruleus2.77 (75.7%Q)0.89 (24.3%Q)0.160.08
Author response table 1
Usage frequencies of TAA, TAG, TAAQ, TAGQ, CAAQ and CAGQ codonsin the entire proteomes of 20 different organisms.
Species_nameCAACAGTAATAG
Saccharomyces cerevisiae S288c2.73(62.6%Q)1.21(37.4%Q)0.110.05
Candida albicans3.57(84.6%Q)0.65(15.4%Q)0.10.05
Candida auris1.81(46.1%Q)2.12(53.9%Q)0.080.06
Candida tropicalis3.61(84.5%Q)0.66(15.5%Q)0.10.07
Neurospora crassa1.70(39.5%Q)2.60(60.5%Q)0.060.05
Magnaporthe oryzae1.37(33.7%Q)2.69(66.3%Q)0.060.07
Trichoderma reesei1.17(28.4%Q)2.95(71.6%Q)0.060.06
Cryptococcus neoformans2.06(53.5%Q)1.79(46.5%Q)0.070.06
Ustilago maydis1.82(41.3%Q)2.61(58.7%Q)0.040.05
Taiwanofungus camphoratus1.57(41.8%Q)2.19(58.2%Q)0.050.06
Dictyostelium discoideum4.86(96.2%Q)0.19(3.8%Q)0.160.01
Plasmodium falciparum2.42(86.7%Q)0.37(13.3%Q)0.090.01
Drosophila melanogaster1.56(13.4%Q)3.61(86.6%Q)0.080.07
Aedes aegypti1.76(40.6%Q)2.58(59.4%Q)0.110.07
Caenorhabditis elegans2.74(65.6%Q)1.44(34.4%Q)0.160.06
Danio rerio1.18(26.0%Q)3.35(74.0%Q)0.110.06
Mus musculus1.20(26.0%Q)3.41(74.0%Q)0.10.08
Homo sapiens1.23(26.5%Q)3.42(73.5%Q)0.10.08
Arabidopsis thaliana1.94(56.1%Q)1.52(43.9%Q)0.090.05
Chlamydomonas reinhardtii0.59(12.7%Q)4.05(87.3%Q)0.030.04
Tetrahymena thermophila2.04(21.2%Q)0.48(5.0%Q)5.46(56.8%Q)1.63(17.0%Q)
Paramecium tetraurelia2.54(27.9%Q)0.57(6.3%Q)4.53(46.7%Q)1.48(16.2%Q)
Oxytricha trifallax2.68(29.9%Q)1.07(12.0%Q)3.63(40.6%Q)1.57(17.5%Q)
Stylonychia lemnae2.26(21.1%Q)1.05(12.6%Q)3.22(38.6%Q)1.81(21.7%Q)
Pseudocohnilembus persalinus1.76(18.0%Q)0.37(3.8%Q)7.36(76.0%Q)1.39(14.4%Q)
Stentor coeruleus2.77(75.7%Q)0.89(24.3%Q)0.160.08
Author response table 2
S. cerevisiae strains used in this study.
NameGenotype
WHY13008MATa, ho, leu2, ura3, his 4-X::LEU2-(NgoMIV;+ ori)-URA3, ERGI(SpeI),
RAD51::hphMX4
WHY13283MATa, ho, leu2, ura3, his 4-X::LEU2-(NgoMIV;+ ori)-URA3, ERGI(SpeI),
rad51A::hphMX4
WHY13416MATa, ho, leu2, ura3, his4-X::LEU2-(NgoMIV;+ori)-URA3, ERGI(SpeI),
rad51 /_\N::hphMX4
WHY13744
WHY13743
WHY13741MATa, ho::LYS2, leu2, ura3, lys2, HIS4::LEU2-(BamHI;+ori), ERGI(SalI),
SUP35 ^(PND ")-rad51AN::hphMX4"
WHY10271MATa, ho::hisG, lys2, leu2::hisG, arg4-nsp, ura3
WHY13970MATa his 3Delta1, leu 2Delta0, met15 Delta0, ura3 Delta0
WHY13785MATa his 3Delta1, leu 2Delta0, met 15 Delta0, ura 3Delta0, hsp104::kanMX4
WHY14126MATa his 3Delta1, leu 2Delta0, met 15 Delta0, ura 3Delta0, new 1::kan MX4
WHY14129MATa his 3Delta1, leu 2Delta0, met 15 Delta0, ura 3Delta0, doa 4::kan MX4
WHY14227
WHY13989MATa his 3Delta1, leu 2Delta0, met 15 Delta0, ura 3Delta0,san 1:'kan MX4
WHY14132MATa his 3Delta1, leu 2Delta0, met 15 Delta0, ura 3Delta0, oaz 1:'kan MX4

Additional files

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Chi-Ning Chuang
  2. Hou-Cheng Liu
  3. Tai-Ting Woo
  4. Ju-Lan Chao
  5. Chiung-Ya Chen
  6. Hisao-Tang Hu
  7. Yi-Ping Hsueh
  8. Ting-Fang Wang
(2024)
Noncanonical usage of stop codons in ciliates expands proteins with structurally flexible Q-rich motifs
eLife 12:RP91405.
https://doi.org/10.7554/eLife.91405.3