1. Genetics and Genomics
Download icon

Codon usage biases co-evolve with transcription termination machinery to suppress premature cleavage and polyadenylation

  1. Zhipeng Zhou
  2. Yunkun Dang  Is a corresponding author
  3. Mian Zhou
  4. Haiyan Yuan
  5. Yi Liu  Is a corresponding author
  1. The University of Texas Southwestern Medical Center, United States
  2. Yunnan University, China
  3. East China University of Science and Technology, China
Research Article
  • Cited 21
  • Views 3,143
  • Annotations
Cite this article as: eLife 2018;7:e33569 doi: 10.7554/eLife.33569

Abstract

Codon usage biases are found in all genomes and influence protein expression levels. The codon usage effect on protein expression was thought to be mainly due to its impact on translation. Here, we show that transcription termination is an important driving force for codon usage bias in eukaryotes. Using Neurospora crassa as a model organism, we demonstrated that introduction of rare codons results in premature transcription termination (PTT) within open reading frames and abolishment of full-length mRNA. PTT is a wide-spread phenomenon in Neurospora, and there is a strong negative correlation between codon usage bias and PTT events. Rare codons lead to the formation of putative poly(A) signals and PTT. A similar role for codon usage bias was also observed in mouse cells. Together, these results suggest that codon usage biases co-evolve with the transcription termination machinery to suppress premature termination of transcription and thus allow for optimal gene expression.

https://doi.org/10.7554/eLife.33569.001

Introduction

Due to the redundancy of triplet genetic codons, most amino acids are encoded by two to six synonymous codons. Synonymous codons are not used with equal frequencies, a phenomenon called codon usage bias (Ikemura, 1985; Sharp et al., 1986; Comeron, 2004; Plotkin and Kudla, 2011; Hershberg and Petrov, 2008). Highly expressed proteins are mostly encoded by genes with preferred codons, and codon optimization has been routinely used to enhance heterologous protein expression. In addition, positive correlations between codon usage and protein expression levels are observed in different organisms (Hiraoka et al., 2009; Duret and Mouchiroud, 1999). These results suggest that codon usage plays an important role in regulating gene expression levels. Efficient and accurate translation was thought to be a major selection force for codon usage biases (Hiraoka et al., 2009; Duret and Mouchiroud, 1999; Akashi, 1994; Drummond and Wilke, 2008; Xu et al., 2013; Zhou et al., 2013a; Lampson et al., 2013; Pershing et al., 2015). Recent studies also demonstrated that codon usage affects co-translational protein folding by regulating translation elongation rate in both prokaryotes and eukaryotes (Zhou et al., 2013a; Spencer et al., 2012; Pechmann et al., 2014; Yu et al., 2015; Zhou et al., 2015; Fu et al., 2016; Zhao et al., 2017).

Although the correlation between codon usage and gene expression level can be explained by translation efficiency, recent studies suggest that overall translation efficiency is mainly determined by translation initiation, a process that is mostly determined by RNA structure but not codon usage near the translational start site (Kudla et al., 2009; Pop et al., 2014; Tuller et al., 2010). In addition, codon usage was found to be an important determinant of RNA levels in many organisms (Presnyak et al., 2015; Boël et al., 2016; Zhou et al., 2016; Kudla et al., 2006; Krinner et al., 2014). In some organisms, codon usage was shown to affect RNA stability (Presnyak et al., 2015; Boël et al., 2016; Zhou et al., 2016; Kudla et al., 2006; Krinner et al., 2014; Mishima and Tomari, 2016; Bazzini et al., 2016). In Neurospora and mammalian cells, codon usage has also been shown to be an important determinant of gene transcription levels (Zhou et al., 2016; Newman et al., 2016). Therefore, codon usage can regulate gene expression beyond the translation process.

Transcription termination is a critical process in regulating gene expression. In eukaryotes, the maturation of mRNA is a two-step process involving endonucleolytic cleavage of the nascent RNA followed by the synthesis of the polyadenosine (poly(A)) tail (Tian and Graber, 2012; Shi and Manley, 2015; Proudfoot, 2011; Proudfoot, 2016; Tian and Manley, 2017; Kuehner et al., 2011; Porrua and Libri, 2015). The polyadenylation sites, also known as the poly(A) sites or pA sites, are defined by surrounding cis-elements recognized by multiple proteins. These cis-elements, called poly(A) signals, are generally AU-rich sequences and have conserved nucleotide composition in eukaryotes. In mammals, the hexamer AAUAAA (or other close variants), referred as polyadenylation signal (PAS), is one of the most prominent poly(A) signals. Other cis-elements, such as upstream U-rich elements, downstream U-rich element, and downstream GU-rich element, also play important roles in the transcription termination process. In yeast, poly(A) signals include an upstream efficiency element (EE), an upstream position element (PE), which is equivalent to PAS in mammals, and two U-rich elements flanking the poly(A) sites (Moqtaderi et al., 2013; Mata, 2013; Ozsolak et al., 2010; Schlackow et al., 2013; Liu et al., 2017a). Mutation of these poly(A) signals impairs the efficiency of transcription termination and leads to defect in mRNA processing (Tian and Graber, 2012; Shi and Manley, 2015; Proudfoot, 2011; Proudfoot, 2016; Tian and Manley, 2017; Kuehner et al., 2011; Porrua and Libri, 2015).

Although most of the transcriptional events terminate in 3’ untranslated region (3’ UTR) of protein-coding genes, premature transcription termination (PTT) also occurs in 5’ UTR, intron and exon, which is also referred as premature cleavage and polyadenylation (PCPA) (Tian et al., 2007; Kaida et al., 2010; Berg et al., 2012; Jan et al., 2011; Liu et al., 2017b; Ulitsky et al., 2012; Yang et al., 2016). For example, premature transcription termination occurs in the intron of the Arabidopsis FCA gene and is involved in the control of flowering timing (Quesada et al., 2003; Macknight et al., 2002). Moreover, PCPA in intron is a conserved regulatory mechanism for CstF-77 gene from fly to human (Mitchelson et al., 1993; Pan et al., 2006; Luo et al., 2013). Recent genome-wide studies have shown that PCPA within introns is a widespread phenomenon in eukaryotes (Tian et al., 2007; Liu et al., 2017b; van Hoof et al., 2002; Frischmeyer et al., 2002; Mayr and Bartel, 2009). In addition, PCPA also occurs within coding regions (Tian et al., 2007; Kaida et al., 2010; Jan et al., 2011; Dunlap and Loros, 2017). It has been shown that the expression of heterologous gene can be suppressed due to PCPA in coding regions (Diehn et al., 1998; Tokuoka et al., 2008). PCPA also occurs in coding regions of endogenous genes in both yeast and human (van Hoof et al., 2002; Frischmeyer et al., 2002; Georis et al., 2015). Importantly, many poly(A) sites have been mapped to coding regions using poly(A) sequencing methods (Jan et al., 2011; Liu et al., 2017b; Ulitsky et al., 2012; Yang et al., 2016). Codon optimization has been previously shown to increase heterologous gene expression in Aspergillus oryzae (Tokuoka et al., 2008). However, the effect of codon usage on premature transcription termination of endogenous genes is not clear.

The filamentous fungus Neurospora crassa exhibits a strong codon usage bias for C or G at wobble positions and has been an important model organism studying the roles of codon usage biases (Zhou et al., 2013a; Yu et al., 2015; Zhou et al., 2015; Radford and Parish, 1997).In Neurospora, codon usage is a major determinant of gene expression levels and correlates strongly with protein and RNA levels (Zhou et al., 2016). We showed previously that codon usage can regulate mRNA levels at the level of transcription by influencing chromatin structure (Zhou et al., 2016). In this study, we showed that premature transcription termination within open reading frames is affected by codon usage bias. Moreover, a similar phenomenon is observed in mouse, another C/G-biased organism. Therefore, in addition to effects on translation, transcription termination serves as a conserved driving force in shaping codon usage biases in C/G-biased organisms.

Results

Codon deoptimization of the amino-terminal end of the frq open reading frame abolishes the production of full-length mRNA

We previously showed that codon optimization of circadian clock gene frequency (frq) leads to changes in FRQ expression level and protein structure (Zhou et al., 2013a; Zhou et al., 2015). To determine the impact of non-optimal codons on FRQ expression, we codon deoptimized the amino-terminal end of frq ORF (amino acids 12–163) by replacing the wild-type codons with non-optimal synonymous codons (Figure 1A). In the frq-deopt1 construct, 59 codons were replaced by non-optimal codons. In the frq-deopt2 construct, 98 codons were replaced by the least preferred codons (Figure 1—figure supplement 1). These two constructs and the wild-type frq (wt-frq) construct were transformed individually into an frq knock-out strain (frqKO) at the his-3 locus by homologous recombination (Aronson et al., 1994a). In the strains expressing the wild-type frq construct, the production of conidia (asexual spore) was rhythmic with a period of about 22 hr (Figure 1B). However, the conidiation rhythm of the strains expressing the two codon-deoptimized frq constructs was abolished, indicating that the deoptimized frq genes are not functional (Figure 1B). Surprisingly, no FRQ expression was detected in either of the deoptimized strains by western blot (Figure 1C). Northern blot and strand-specific RT-qPCR using a set of primers targeting the middle region of frq ORF showed that no full-length frq mRNA was produced in the deoptimized strains (Figure 1D and E). Together, these results indicate that the introduction of rare synonymous codons within this region of frq abolishes the expression of full-length frq mRNA.

Figure 1 with 1 supplement see all
Codon deoptimization of the amino-terminal end of the frq ORF abolished the expression of full-length frq mRNA.

(A) A diagram showing the frq locus. (B) Race tube analysis showing the conidiation rhythm of the frqKO, wt-frq, frq-deopt1, and frq-deopt2 strains. The strains were first cultured in constant light (LL) for 1 day before transferred to constant darkness (DD). Black lines mark the growth fronts every 24 hr. The distance between asexual spore bands was measured and then divided by growth rate to calculate the period length of conidiation rhythm. For the wt-frq strain, the period of conidiation rhythm was 22.07 ± 0.04 hr. (C) Western blot showing FRQ protein levels in frqKO, wt-frq, frq-deopt1, and frq-deopt2 strains. (D) Northern blot showing the expression of full-length frq mRNA in the indicated strains. An RNA probe specific for 3’ end of frq was used in this experiment. (E) Strand-specific RT-qPCR results showing frq mRNA levels in the indicated strains. Primers used for the qPCR were targeted to the middle of frq ORF.

https://doi.org/10.7554/eLife.33569.002

Codon deoptimization of frq results in premature cleavage and polyadenylation

We have previously shown that rare codons can result in gene silencing through histone H3 trimethylation at lysine 9 (H3K9me3), and the wild-type frq locus is marked by H3K9me3 (Zhou et al., 2016; Dang et al., 2013; Belden et al., 2011). Thus, we examined whether the loss of frq expression in the codon deoptimized strains was due to an increase of H3K9me3 at the frq locus. Chromatin immunoprecipitation (ChIP) assay using an H3K9me3 antibody, however, showed that the H3K9me3 levels at the frq locus were comparable in the wild-type frq and frq-deopt2 strains (Figure 2—figure supplement 1A and B), suggesting that the loss of full-length frq mRNA in the deoptimized frq is not due to H3K9me3-mediated transcriptional silencing. Transcription of frq is activated by the binding of the complex of White Collar-1 (WC-1) and White Collar-2 (WC-2) to the frq promoter, and the expression of FRQ inhibits WC binding (Heintzen and Liu, 2007; Dunlap, 2006). A ChIP assay showed that WC-2 binding at the frq promoter was elevated in the frq-deopt2 strain (Figure 2—figure supplement 1C), suggesting that the loss of full-length frq mRNA expression is not due to transcriptional gene silencing. Consistent with this result, strand-specific RT-qPCR using a set of primers targeted to an intronic region in the 5’ UTR of frq showed that the frq pre-mRNA was increased significantly in the frq-deopt2 strain (Figure 2—figure supplement 1D). These results indicate that even though full-length frq mRNA could not be detected in the codon deoptimized strains, the transcription of frq was actually significantly increased.

Since frq RNA can be detected by strand-specific RT-qPCR using primers targeted to 5’ UTR but not to a region of the frq ORF that is downstream of the codon deoptimized region, we hypothesized that codon deoptimization resulted only in truncated frq mRNA. To test this hypothesis, we performed northern blot analysis using a probe targeted to the 5’ end of frq mRNA (Figure 2—figure supplement 1E). As expected, truncated frq mRNAs but not full-length frq mRNA were detected in both of the deoptimized frq strains (Figure 2A). To characterize the nature of these truncated frq mRNAs, we first examined whether these RNAs were polyadenylated by purifying poly(A) RNAs using oligo-dT beads. Like the full-length frq mRNA, the truncated frq mRNA species were also enriched after oligo-dT purification (Figure 2B). To further confirm these results, we performed poly(A) tail-based 3’ RACE (Scotto-Lavino et al., 2006) and mapped the 3’ ends of the truncated mRNA species in the frq-deopt1 and frq-deopt2 strains. We observed a cluster of 3’ ends within the deoptimized region (112-141nt downstream of the start codon) of frq genes (Figure 2C). Interestingly, PAS variants (AUAAAU in the frq-deopt1 and AUAAAA in the frq-deopt2), which were located in 18-nt upstream of mapped poly(A) sites, were created by codon deoptimization of frq, suggesting these truncated mRNAs may be produced by PAS-dependent pathway (Figure 2C).

Figure 2 with 1 supplement see all
Codon deoptimization of frq results in premature transcription termination.

(A) Northern blot showing the presence of truncated frq mRNA species in both de-optimized strains using an RNA probe targeted to 5’ end of frq mRNA (indicated in ; Figure 2—figure supplement 1E). * indicates a non-specific band. (B) Northern blot showing both full-length and truncated frq mRNA are enriched in poly(A)-containing RNAs. Poly(A) RNAs were purified from total RNAs by using oligo-dT beads. Equal amounts of total RNA or poly(A) RNA were loaded in each lane. Probe specific for 5’ end of frq was used. (C) Poly(A) sites mapped by 3’ RACE. Arrows indicate the mapped poly(A) sites, the red arrows indicate the major poly(A) site that was found in both frq-deopt1 and frq-deopt2 strains, and the black line indicates potential PAS motif (AUAAAU in frq-deopt1 and AAUAAA in frq-deopt2). Nucleotides that are mutated are shown in red. (D) ChIP assay showing RNA pol II levels at the frq transgene loci in the wt-frq-aq and frq-deopt2-aq strains. The ChIP results were normalized by input DNA and represented as Input%. The promoter of qrf was replaced by a qa-2 promoter and tissue were cultured in the absence quinic acid to block qrf transcription. The triangle on the top indicates the location of mapped poly(A) sites. The previously known heterochromatin region ψ63 in Neurospora was used as the negative control. Error bars shown are standard deviations (n = 3). *p<0.05. (E) Northern blot analysis showing premature transcription termination of qrf. f-frq is an frq codon-optimized strain (Zhou et al., 2013a).

https://doi.org/10.7554/eLife.33569.004

There are two possibilities for how these truncated polyadenylated frq mRNAs can be produced: PAS-dependent premature transcription termination or partial degradation of full-length frq mRNAs followed by polyadenylation (van Hoof et al., 2002; Frischmeyer et al., 2002; West et al., 2006; LaCava et al., 2005). In the case of premature transcription termination, RNA polymerase II (pol II) terminates after synthesis of the 5’ region of the pre-mRNA, which is then released from the chromatin (Proudfoot, 2016). It should be noted that frq locus is not only transcribed from sense direction to produce frq mRNA, it is also transcribed from antisense direction to generate the long non-coding RNA qrf (Kramer et al., 2003; Xue et al., 2014) (Figure 2—figure supplement 1E), which can complicate the interpretation of the ChIP results. To overcome this complication, we created two additional frq constructs, wt-frq-aq, and frq-deopt2-aq, in which the promoter of qrf was replaced by the quinic acid (QA) inducible qa-2 promoter. In frq null strains transformed with these constructs, expression of the full-length and truncated frq was not dependent on QA, but qrf was only expressed in the presence of QA (Figure 2—figure supplement 1F). Therefore, we cultured wt-frq-aq and frq-deopt2-aq strains in the absence of QA and performed a ChIP assay using an antibody against pol II phosphorylated at serine 2. The pol II levels at the frq promoter and 5’ UTR were comparable in the wt-frq-aq and frq-deopt2-aq strains, but pol II levels in the middle and 3’ end of frq ORF were decreased dramatically in the frq-deopt2-aq strain compared to the wt-frq-aq strain (Figure 2D). Together, these results demonstrate that codon deoptimization of frq abolished its expression due to premature transcription termination.

Codon deoptimization of frq also resulted in the premature transcription termination of qrf as indicated by the loss of full-length qrf and appearance of truncated qrf mRNA in the frq-deopt1 and frq-deopt2 strains (Figure 2E and Figure 2—figure supplement 1F). 3’ RACE result showed that the 3’ ends of the truncated qrf mRNAs in the frq-deopt1 strains also localized in the deoptimized region with a potential PAS (AUAAAA) motif 21-nt upstream of the 3’ ends (Figure 2—figure supplement 1G). It should be noted that the wt-frq gene also has the same putative PAS motif, suggesting that the nucleotide sequence near PAS motif is also required for transcription termination.

PAS motif and other cis-elements created by codon deoptimization are important for premature transcription termination of frq

The results above suggest that codon deoptimization of frq may create potential poly(A) signals that can result in premature transcription termination of frq. To identify the codon or codons that are critical for premature transcription termination, we create additional codon deoptimized frq genes (frq-deopt3, 4, and 5) by deoptimizing different regions of frq ORF around the 3’ ends identified in the frq-deopt2 strains (Figure 3A). Neither full-length frq mRNA nor FRQ protein was detected in the frq-deopt3 strain (Figure 3B and C), suggesting that the deoptimized region in frq-deopt3 contains all elements sufficient to trigger transcription termination. The low level of the prematurely terminated products in the frq-deopt3 strain, suggesting that these products may be rapidly degraded by the RNA quality control mechanisms (van Hoof et al., 2002; Frischmeyer et al., 2002; Doma and Parker, 2007; Vanacova and Stefl, 2007; Schmid and Jensen, 2010). In the frq-deopt4 strain, both full-length frq RNA and FRQ protein were detected, but their levels were significantly lower than that in the wt-frq strain (Figure 3B and C). ChIP result showed that polII levels at the frq transcription start site were comparable in the wt-frq and frq-deopt4 strains (Figure 3—figure supplement 1B), suggesting that the decrease of full-length frq mRNA in the frq-deopt4 strain was not due to transcriptional silencing. Notably, the level of premature terminated frq RNAs in the frq-deopt4 strain was also lower than that in the frq-deopt2 strain, suggesting that transcription termination efficiency was decreased due to the lack of some elements. The levels of frq mRNA and FRQ protein in the frq-deopt5 strain were higher than those in the frq-deopt4 strain but were much lower than those in the wt-frq strain (Figure 3B and C). Premature termination products in the frq-deopt5 strain were further decreased compared to that in the frq-deopt4 strain. Although frq-deopt4 and frq-deopt5 strain share the same PAS motif, the production of premature termination products in these strains was markedly reduced, suggesting that other cis-elements surrounding the PAS motif are also important for PCPA efficiency.

Figure 3 with 1 supplement see all
Rare codons promote while optimal codons suppress premature transcription termination of frq.

(A) A diagram showing the constructs created to map codons important for premature transcription termination. The triangle indicates the location of the mapped poly(A) sites. Black bars indicate the regions where wild-type codons are used, whereas white bars indicate regions that are codon de-optimized. (B) Left panel, northern blot analysis showing the expression of full-length and premature terminated frq mRNA species in the indicated strains. The asterisks indicate non-specific bands. A probe for frq 5’ end was used. Right panel, densitometric analyses of results from three independent experiments. Error bars shown are standard deviations (n = 3). ***p<0.001. (C) Left panel, western blot result showing FRQ protein levels in the indicated strains. The asterisk indicates a non-specific band. Right panel, densitometric analyses of results from three independent experiments. (D) Left top panel, western analyses showing FRQ protein levels in the wt-frq, frq-deopt6, and frq-deopt7 strains. Left bottom panel, northern blot showing full-length frq mRNA levels in the indicated strains. Middle panel, densitometric analysis of FRQ levels from three independent experiments. Right, densitometric analyses of full-length frq mRNA levels from three independent experiments. Error bars shown are standard deviations (n = 3). **p<0.01, ***p<0.001. (E) Left top panel, western analyses showing FRQ protein levels in the frq-deopt4 and frq-deopt4* strains. Left bottom panel, northern blot showing full-length frq mRNA levels in the indicated strains. An RNA probe specific for 5’ end of frq was used. Middle, densitometric analyses of FRQ levels from three independent experiments. Right, densitometric analyses of full-length frq mRNA levels from three independent experiments. Error bars shown are standard deviations (n = 3). ***p<0.001.

https://doi.org/10.7554/eLife.33569.006

Sequence analysis revealed that a GAC to GAU mutation created a potential PAS motif in both frq-deopt1 and frq-deopt2 constructs (Figure 2C). Another UCC to AGU mutation 4-nt upstream of this PAS motif was also present in both frq-deopt1 and frq-deopt2 constructs. Thus, we hypothesized that deoptimization of these codons triggers premature termination of frq transcription. To test this hypothesis, we made two more constructs in which only one codon in the wild-type frq gene was deoptimized: frq-deopt6, in which the GAC codon in the wt-frq was replaced by GAU and therefore created PAS motif AUAAAA, and frq-deopt7, in which the UCC codon was changed to AGU (Figure 3A). Even though the effect of these mutations on frq mRNA levels was not as dramatic as for the other constructs, these constructs with only a single synonymous codon mutation resulted in significant reduction of frq mRNA and FRQ protein levels and the loss of circadian conidiation rhythms (Figure 3D and Figure 3—figure supplement 1A and C). To further confirm the importance of PAS motif, we created the frq-deopt4* construct, in which the PAS motif in the frq-deopt4 was mutated by replacing these two non-preferred codons with the synonymous optimal codons. We found that both frq RNA and FRQ levels increased significantly in the frq-deopt4* strain compared to that in the frq-deopt4 strain, and the level of prematurely terminated frq mRNA was also reduced (Figure 3E and Figure 3—figure supplement 1D). These results indicate that the PAS motif created by codon deoptimization is important for PCPA in frq and additional motifs surrounding the PAS are also involved in promoting PCPA. Together these results suggest that codon usage, by affecting the formation of potential PAS and other cis-elements, plays an important role in suppressing premature transcription termination to allow production of full-length mRNAs in Neurospora.

Genome-wide identification of premature transcription termination events in the open reading frames of Neurospora genes

Our results suggest that the use of rare codons in Neurospora genes may play a role in promoting premature transcription termination. Thus, we asked whether PCPA also occur in endogenous genes by performing genome-wide identification of poly(A) sites. Because premature transcription termination products are usually rapidly degraded in the cytosol (van Hoof et al., 2002; Frischmeyer et al., 2002), we extracted polyadenylated RNAs from nuclei and mapped the 3’ ends using a modified poly(A)-tail-primed sequencing method (2P-seq) (Spies et al., 2013). To limit false positive reads, we used a low concentration of reverse transcription (RT) primer during sequencing library preparation (Scotto-Lavino et al., 2006) and a stringent filtering procedure for data analysis (see Materials and methods). It should be noted that the poly(A) sites identified by this method may not be only limited from those generated by the PAS-dependent pathway (Porrua and Libri, 2015; West et al., 2006; LaCava et al., 2005). Two biological replicates of 2P-seq results were highly consistent (Figure 4—figure supplement 1A), indicating the reliability of the method. Approximately 20 million reads were generated from two independent samples. The vast majority of these reads were mapped to 3’ UTRs of genes (i.e. annotated 3’ UTR plus 1000 nt downstream) and the rest of reads were mapped to ORFs, introns, 5’ UTRs, and intergenic regions. We focused our analysis on the reads mapped to the 3’ UTRs and ORFs, with deduced poly(A) sites referred as 3’ UTR-pA and ORF-pA, respectively. The putative PAS regions (including −30 ~ −10nt A-rich region and −10 ~+10 nt U-rich region) are referred as 3’ UTR-PAS and ORF-PAS, respectively. From 9795 annotated genes, 7755 genes have 3’ UTR-PAS signals (RPM >1), which is comparable to our previous RNA-seq results (Zhou et al., 2016), indicating the sensitivity of our 2P-seq method. Within these genes, 4557 genes were identified to have at least 1 ORF-pA signal. The mapped 2P-seq reads for two Neurospora genes (NCU09435 and NCU00931), which had considerable reads in their ORFs, are shown in Figure 4A and Figure 4—figure supplement 1B. To confirm these results, northern blot analyses were performed. As expected, in addition to the full-length mRNAs, small amounts of prematurely terminated mRNA species with expected sizes based on the location of the 2P reads were also detected (Figure 4B). These results suggest that premature transcription termination within ORFs is a common phenomenon in Neurospora.

Figure 4 with 1 supplement see all
Genome-wide identification of premature transcription termination events in ORF of endogenous Neurospora genes.

(A) 2P-seq results on NCU09435 (top) and NCU00931 (bottom) genes showing the transcription termination events in the 3’ UTR and ORF. (B) Northern blot analyses showing the presence of both full-length and prematurely terminated NCU09435 mRNA (left) and NCU00931 mRNA (right) in the wild-type strain, respectively. An RNA probe specific for 5’ end of NCU09435 or NCU00931 was used, respectively. * indicates a non-specific band. (C) Genome-wide nucleotide composition surrounding mRNA 3’ ends in the 3’UTRs. 0 indicates the position of the mapped 3’ end of mRNA. The triangle indicates the downstream U-rich element. (D) Top 15 most frequently used PAS motifs found in the A-rich element of 3’ UTR-PAS. (E) Genome-wide nucleotide sequence composition surrounding ORF-pA sites. (F) Top 15 most frequently used PAS motifs found in the A-rich element of ORF-PAS. (G) Box-plots of PAS scores for 3’UTR-PAS and ORF-PAS determined in Neurospora.

https://doi.org/10.7554/eLife.33569.008

To compare the transcription termination events in 3’ UTRs and ORFs, we analyzed the nucleotide composition around the identified poly(A) sites. Similar to previous results from analysis of yeast transcription termination regions (Tian and Graber, 2012), the sequence profile surrounding 3’ UTR-pA sites in Neurospora has an upstream A-rich region located at −30 to −10 nucleotides from poly(A) site and two U-rich regions at −10 to 0 nucleotides and at 0 to +10 nucleotides (Figure 4C). However, the U-rich region immediately downstream of the poly(A) site is much less prominent than that in yeast (see below). The poly(A) site is usually located between C and A nucleotides. A motif search of most enriched hexamers within the A-rich region was performed to identify putative PAS motifs (Ulitsky et al., 2012). In mammals, AAUAAA and AUUAAA are the two most frequently used PAS motifs and are found in ~80% of all 3’ UTR-PAS (Tian and Graber, 2012; Proudfoot, 2011; Manley, 2015). The PAS motifs in Neurospora are much more degenerated with AAUGAA being the most abundant and AAUAAA is the third most-enriched motif (Figure 4D).

Although the nucleotide profile surrounding ORF-pA sites was similar to that of 3’ UTR-pA sites with A-rich and U-rich elements upstream of the C/A poly(A) site, there does not appear to be a U-rich region downstream (Figure 4E). In addition, the hexamer motifs in the A-rich region of ORF-PASs were quite degenerative (Figure 4F). Among the top 15 most enriched hexamer motifs, only five were shared between ORF-PAS and 3’ UTR-PAS regions (Figure 4D and F). To further compare 3’ UTR-PAS and ORF-PAS, we generated consensus PAS sequences to build position-specific scoring matrices (PSSMs) for PAS regions by using sequences (−30 ~+10 nt) as previously described (Tian et al., 2007). The PSSMs were then used to score all 3’ UTR-PASs and ORF-PASs. A high PAS score indicates a high similarity to the consensus and, presumably, a stronger signal for transcription termination. As shown in Figure 4G, ORF-PASs generally show lower PAS scores than that of 3’ UTR-PASs. These results suggest that premature transcription termination within ORFs occurs through a mechanism similar to that in the 3’ UTR with recognition of the poly(A) site mostly mediated by non-canonical poly(A) signals.

Strong genome-wide correlations between codon usage and premature transcription termination

To understand the role of codon usage in PCPA, we examined the genome-wide correlations between gene codon usage and transcription termination events within Neurospora ORFs. Based on the 2P-seq results, we selected 2957 genes (RPM >10) that have ORF-pA sites and calculated the normalized ratio between the numbers of termination events in the ORFs and in the 3’ UTRs. The ratios were less than 10% for 95% of the genes with ORF-pA, which should be due to that these non-canonical poly(A) signals within ORFs are less efficient in promoting premature cleavage and polyadenylation (Berg et al., 2012; Guo et al., 2011) or that the premature terminated RNAs are unstable (van Hoof et al., 2002; Frischmeyer et al., 2002; Doma and Parker, 2007; Vanacova and Stefl, 2007). We also calculated the codon bias index (CBI) and codon adaptation index (CAI) for every protein-coding gene in Neurospora (Bennetzen and Hall, 1982; Sharp and Li, 1987). The normalized values of ORF to 3’ UTR termination events showed a strong negative correlation with both CBI and CAI (Figure 5A and B). These results suggest that codon usage, by affecting the formation of potential poly(A) signals, plays an important role in PCPA in Neurospora. For Neurospora genes, there is a strong preference for C/G at the wobble positions, thus genes with more rare codons should have higher AU contents and potentially higher chance of forming poly(A) signals to trigger premature termination.

Figure 5 with 3 supplements see all
Strong genome-wide correlations between codon usage and premature transcription termination events.

(A) Scatter plot of normalized ORF/3’ UTR termination events (log10) vs. CBI. r = −0.64, p<2.2 × 10−16, n = 2957. (B) Scatter analysis showing the correlation of normalized ORF/3’ UTR termination events with CAI. Pearson’s r = −0.56. p<2.2 10−16, n = 2957. (C) Northern blot analyses showing that premature transcription termination was abolished after codon optimization of NCU09435. gfp-NCU09435-wt and gfp-NCU09435-opt were targeted to the his-3 locus, and an RNA probe specific for gfp was used. The asterisks indicate non-specific bands. (D) Northern blot analyses showing that premature transcription termination was observed after codon de-optimization of NCU02034. gfp-NCU02034-wt and gfp-NCU02034-deopt were targeted to the his-3 locus, and an RNA probe specific for gfp was used. (E) Scatter plot of normalized codon usage frequency (NCUF) (log2) with relative synonymous codon adaptiveness (RSCA) of all codons with at least two synonymous codons. r = −0.55, p=3.8 × 10−6, n = 59. (F) The correlation of normalized codon usage frequency (NCUF) with relative synonymous codon adaptiveness (RSCA) within each synonymous codon group with at least two synonymous codons. NCUF values of every codon within the −10 to −30 regions upstream of all identified ORF-pA sites was calculated. (G) A graph showing the ranking of all codon pairs by normalized codon pair frequency (NCPF). Codon pairs are ranked based on their NCPF values.

https://doi.org/10.7554/eLife.33569.010

To confirm this conclusion, we tested whether a premature termination event within ORF can be abolished by optimizing surrounding rare codons or created by introducing rare codons within an ORF. NCU09435, which has a prominent termination site at the 5’ end of its ORF (Figure 4A), was selected as a reporter gene. The NCU09435 ORF was inserted downstream of gfp to create the gfp-NCU09435-wt construct. The 19 non-optimal codons surrounding the premature termination site of NCU09435 were replaced by optimal codons to create the gfp-NCU09435-opt construct (Figure 5—figure supplement 1A). Codon optimization led to not only the loss of the putative PAS motif (AAUCAA) but also mutation of its immediate surrounding sequence. Northern blot analysis showed that the codon optimization completely abolished the premature termination product of the gfp-NCU09435 fusion gene (Figure 5C), indicating the importance of the PAS motif and its surrounding sequence on premature transcription termination.

NCU02034 has no detectable premature termination events even though it has a potential PAS motif (AAUAAA) within its ORF. The 16 optimal codons surrounding the potential PAS motif in the gfp-NCU02034-wt were replaced by rare synonymous codons to create the gfp-NCU02034-deopt construct (Figure 5—figure supplement 1B). Using this reporter, we showed that codon deoptimization abolished the expression of full-length transcript, and transcription was prematurely terminated around the deoptimized region (Figure 5D). These results demonstrate the importance of codon usage and cis-elements around the PAS motif in promoting premature transcription termination.

To further understand how codon usage bias influences premature transcription termination, we determined the codon usage frequency of every codon within the −10 to −30 regions upstream of all identified ORF-pA sites. The frequency of each codon was then normalized to the frequency of the same codon from randomly chosen regions from the same gene set to create the normalized codon usage frequency (NCUF). For a given codon, a positive NCUF value means that this codon occurs more frequently within the PAS region relative to other regions of the gene; a negative value indicates that a codon occurs less frequently. Overall, there is a strong negative correlation (r = −0.55) between NCUF with the relative synonymous codon adaptiveness (RSCA) (Roth et al., 2012) of all codon groups with at least two synonymous codons (Figure 5E). Moreover, as shown in Figure 5F, amino acids encoded by A/T-rich codons, such as isoleucine, lysine, asparagine, and tyrosine, are over-represented within the region upstream of poly(A) sites in ORFs. These results indicate that rare codons are preferentially used upstream of the poly(A) sites within ORFs.

In addition to codon usage bias, pairs of synonymous codons are also not used equally in the genome, a phenomenon called codon pair bias (Gutman and Hatfield, 1989; Moura et al., 2007). Since PAS motifs are hexamers, we examined codon pair usage in the region surrounding the ORF-pA sites. We determined codon pair frequencies in 12106 ORF-PAS sequences (−30 ~ −10nt) that belongs to 4557 genes and normalized the values with background codon pair frequencies randomly chosen from the same gene set to generate normalized codon pair frequency (NCPF). For a given codon pair, a positive value of NCPF means that this codon pair is over-represented. Remarkably, when codon pairs are ranked based on NCPF, the five most enriched codon pairs are all made of two of the least preferred codons (Figure 5G). For example, AAUAGA is the most enriched codon pair and is 25-fold enriched in this region relative to the genome frequency and is one of the most often observed PAS motifs in 3’ UTRs and ORFs (Figure 4D and F). These results support our hypothesis that the use of rare codons leads to the formation of potential PAS motifs and transcription termination, whereas the use of preferred codons in Neurospora genes suppresses premature transcription termination events to ensure expression of full-length mRNAs by depleting the formation of potential PAS motifs.

To examine whether the presence of single PAS motif is sufficient to terminate transcription, we performed the following analysis. We searched for all the DNA fragments containing a putative PAS motif in ORFs and 3’ UTRs of all Neurospora genes and divided them into two groups: ‘true PAS’, which have at least one identified pA site within the 5-35nt downstream region, and ‘false PAS’, which don’t have a pA site (see Materials and methods for details). In 3’ UTR, 17404 are true PAS motifs and 20643 false PAS motifs. We compared the nucleotide composition between these two groups and found out that the AU contents of surrounding the true PAS motifs were higher than those of the false PAS motifs (Figure 5—figure supplement 2A). In the coding region, 4468 true PAS and 80086 false PAS were found. The ratio of true PAS to false PAS in ORF (true PAS/false PAS=0.086) is much lower than that in the 3’ UTR (1.03), indicating that a single PAS motif alone is not sufficient to trigger PCPA in the ORFs. Similar to the 3’ UTR, AU contents of surrounding true PAS motifs were also higher than those of false PAS motifs (Figure 5—figure supplement 2B). These results indicate that cis-elements surrounding PAS are also important for transcription termination in the 3’ UTR and ORF. Consistent with this conclusion, a false PAS in NCU02034 can be turned into a true PAS by deoptimizing the codons surrounding PAS motif (Figure 5D).

We then compared the codon usage surrounding true PAS or false PAS (see Materials and Methods). We determined the codon usage frequency of every codon surrounding true PAS and then normalized by that of the same codon surrounding false PAS to create the relative codon usage frequency (RCUF) (see Method and Material for details). As shown in Figure 5—figure supplement 3A, there was a strong negative correlation (r = −0.66) between RCUF with RSCA of all codon groups with at least two synonymous codons. Moreover, a negative correlation was found between RCUF and RSCA within every codon group (Figure 5—figure supplement 3B). These results indicate that rare codons are preferentially used in the regions surrounding true PAS motif and are critical for promoting premature transcription termination. Together, these results suggest that codon usage affects premature transcription termination in Neurospora by creating potential PAS motif and its surrounding cis-elements.

Transcription termination events in Schizosaccharomyces pombe, an organism with A/U-biased codon usage

The mechanism of transcription termination is largely conserved in eukaryotic organisms and requires similar A/U-rich motifs despite distinct codon usage biases of different genomes. Our data indicate that codon usage bias influences transcription termination in Neurospora, which has a C/G-biased genome. To evaluate how A/U-biased organisms prevent frequent premature transcription termination, we evaluated data from the fission yeast Schizosaccharomyces pombe. S. pombe has a strong A/U bias for every codon family, and its gene expression levels are correlated with gene codon usage bias (Hiraoka et al., 2009; Forsburg, 1994). Using the high-quality poly(A)-seq data recently generated for S. pombe (Lemay et al., 2016), we identified 7429 ORF-pA sites in 2894 genes. As shown in Figure 6A, there is an A-rich region in the region −30 to −10 nucleotides upstream and two equally prominent U-rich regions flanking the poly(A) sites in the 3’ UTR. The nucleotide composition of the poly(A) sites within ORFs has a very similar profile with a prominent downstream U-rich domain, which is missing in Neurospora. This result suggests that the downstream U-rich region is required for transcription termination, a conclusion that is supported by previous experimental studies (Dichtl and Keller, 2001; Graber et al., 1999; van Helden et al., 2000). Comparison of the poly(A) sites in 3’ UTRs with those within ORFs showed that the ORF-pA sites have significantly lower PAS scores than those in 3’ UTR (Figure 6B).

Transcription termination events in Schizosaccharomyces pombe.

(A) Nucleotide sequence composition surrounding the poly(A) sites located in 3’ UTR (left) and in ORF (right) in S. pombe. (B) Genome-wide PAS scores for 3’UTR-PAS and ORF-PAS. (C) Scatter analysis showing the correlation of normalized ORF/3’ UTR termination with CBI. r = 0.07, p=0.04, n = 1014. (D) Scatter plot of normalized ORF/3’ UTR termination vs CAI. r = −0.06, p=0.03, n = 1014.

https://doi.org/10.7554/eLife.33569.014

Compared to S. pombe, the U-rich region downstream of poly(A) sites in Neurospora 3’ UTRs is much less prominent (Figure 4C), and the downstream U-rich region is absent for poly(A) sites within ORFs in Neurospora (Figure 4E). This suggests that the U-rich region just downstream of the poly(A) site is not an essential element for transcription termination in ORFs in Neurospora. Therefore, the nucleotide sequence requirement for transcription termination in ORFs in S. pombe appears to be more stringent than that in Neurospora. This provides an explanation for how an A/U-biased organism such as S. pombe can have highly expressed A/U-rich genes without prominent PCPA in ORF. Consistent with this notion, in S. pombe the normalized values of ORF to 3’ UTR termination events have little or no correlation with CBI or CAI values (Figure 6C and D). Thus, organisms with C/G and A/U codon usage biases in fungi appear to use different mechanisms to adapt transcription termination process to allow optimal gene transcription.

A conserved role for codon usage in transcription termination in mouse

As Neurospora, the codon usage of mammalian genomes is also C/G-biased. Although PAS signals in mammals are also AU-rich sequences, additional cis-elements, including downstream GU-rich and G-rich elements, are also required for efficient polyadenylation and transcription termination (Tian and Graber, 2012). In addition, protein factors or complexes involved in polyadenylation are only partially conserved between fungi and mammals (Tian and Graber, 2012; Shi and Manley, 2015). To examine whether premature transcription termination is also affected by codon usage bias in mammals, we analyzed the recently available high-quality poly(A)-seq data from mouse (Yang et al., 2016). By combining the data from four replicates of C2C12 myoblast control samples, we identified 2564 poly(A) sites within the ORFs of 1429 genes. We compared the nucleotide profile of ORF-PAS to that of 3’ UTR-PAS and we found that the upstream U-rich region (−15 to −1) and the downstream GU-rich region (+5 to+20) largely disappeared and the AU content decreased in ORF-PASs, suggesting that non-canonical poly(A) signals are used for the premature transcription termination in ORFs in mouse (Figure 7A). Consistent with this result, we found that ORF-PASs have significantly lower PAS scores than 3’UTR-PAS (Figure 7B). Moreover, although AAUAAA and AUUAAA are the top two most frequently used PAS motifs in ORF-PAS, their frequencies are much lower than to that in 3’UTR (Figure 7—figure supplement 1A and B). As in other organisms (Tian et al., 2007; Liu et al., 2017b), widespread premature transcription termination events in introns were found in mouse with 10923 poly(A) sites in the introns of 8345 genes. Different from ORF-PAS, however, the nucleotide profile of the intronic PAS was almost identical to that of 3’ UTR-PAS (Figure 7—figure supplement 1C) and the frequency of AAUAAA and AUUAAA was only slightly lower than that in the 3’ UTR (Figure 7—figure supplement 1D). Together, these data suggest that a similar mechanism as that of 3’UTR are employed for premature transcription termination in ORFs with the tendency to use non-canonical poly(A) signals.

Figure 7 with 2 supplements see all
Premature transcription termination events in ORFs in mouse C2C12 cells.

(A) Nucleotide sequence composition surrounding poly(A) sites in 3’ UTR (left) and in ORFs (right) in mouse C2C12 cells. (B) PAS scores for 3’UTR-PAS and ORF-PAS. (C) Scatter analysis showing the correlation of normalized ORF/3’ UTR termination with CBI. Pearson’s r = −0.21. p=6.29 × 10−13, n = 1153. (D) Scatter plot of normalized ORF/3’ UTR termination vs CAI. r = −0.27, p<2.2 × 10−16, n = 1153. (E) The correlation of normalized codon usage frequency (NCUF) with relative synonymous codon adaptiveness (RSCA) within each synonymous codon group with at least two codons. NCUF values were calculated for −10 to −30 nt regions upstream of identified poly(A) sites in ORFs.

https://doi.org/10.7554/eLife.33569.015

We examined the correlations between gene codon usage and transcription termination events within mouse ORFs. We found that there was a negative correlation between normalized ORF/3’UTR termination and CBI (Figure 7C) or CAI (Figure 7D), suggesting that premature transcription termination is also affected by codon usage in mouse. Although C or G is also preferred at the wobble positions in mouse, its codon usage bias is not as strong as that is observed in Neurospora (Zhou et al., 2015). Moreover, unlike in Neurospora, ORFs in mouse are often consist of small exons, which are separated by large introns. These facts may explain the negative correlations between the normalized ORF/3’UTR termination and codon usage bias in mouse is not as strong as that in Neurospora (Figure 5A and B). Next, we calculated the normalized codon usage frequency for the −10 to −30 regions upstream of all ORF-pA sites in mouse. As in Neurospora, a negative correlation between NCUF and RSCA was observed (Figure 7—figure supplement 2A). Moreover, the negative correlation between NCUF and RSCA was found within most synonymous codon groups, except for histidine (Figure 7E). In addition, PAS motifs (AAUAAA and AUUAAA) were among the top three most enriched codon pairs and are over 16-fold enriched in this region relative to the genome frequency (Figure 7—figure supplement 2B). These results indicate that codon preference surrounding ORF-PAS is substantially different from the rest of ORFs. As in Neurospora, the nucleotide U content surrounding the true PAS motifs of 3’ UTR and ORF was higher than that of the false PAS in mouse (Figure 7—figure supplement 2C-D). Together, these analyses suggest that codon usage biases co-evolved with transcription termination machinery to limit premature transcription termination in C/G-biased organisms from Neurospora to mouse.

Discussion

Codon usage is an important determinant of protein expression levels in different organisms. The effects of codon usage bias were thought mainly due to its influence on protein translation. We previously showed that codon usage affects chromatin structures and is a major determinant of mRNA levels by affecting gene transcription (Zhou et al., 2016), indicating that codon usage can impact gene expression beyond translation. In this study, we demonstrated that premature cleavage and polyadenylation in coding region is affected by codon usage biases.

Our conclusion is supported by several lines of evidence. First, replacing wild-type codons with rare codon in parts of the circadian clock gene frq abolished expression of the full-length mRNA transcript due to premature transcription termination. Second, by identifying the rare codons important for premature transcription termination, we showed that rare codons created potential poly(A) signals, including PAS motif and other upstream and downstream cis-elements. Introduction of a single rare codon in the correct context caused premature transcription termination and had a significant effect on frq gene expression. In contrast, the use of optimal codons can suppress premature transcription termination and promote the expression of full-length transcript. Third, genome-wide identification of mRNA poly(A) sites uncovered 12106 premature termination sites in the ORFs of 4557 genes, indicating that premature transcription termination within ORFs is a wide-spread phenomenon in Neurospora. Importantly, we discovered a strong genome-wide negative correlation between gene codon usage and premature transcription termination events in Neurospora genes. Fourth, the regions around the premature termination sites are highly enriched in rare codons and rare codon pairs. Fifth, by comparing nucleotide profile and codon usage bias between true PAS and false PAS, we found that rare codons not only create PAS motif but also other surrounding cis-elements important for transcription termination. Since gene codon usage strongly correlates with mRNA and protein expression levels in Neurospora (Zhou et al., 2016), these results suggest that differential effects of synonymous codons on transcription and premature transcription termination contribute to the determination of gene expression levels. Consistent with our conclusion, a previous study in yeast has shown that codon usage affects stability of antisense transcripts by generation or depletion of the Nrd1 and Nab3 binding sites (Cakiroglu et al., 2016).

The codon usage of the Neurospora genome is strongly C/G biased at the wobble positions. Transcription termination in eukaryotes relies on multiple A/U rich cis-elements around the poly(A) sites (Tian and Graber, 2012; Shi and Manley, 2015; Proudfoot, 2011; Proudfoot, 2016; Tian and Manley, 2017; Kuehner et al., 2011). Clusters of rare codons in Neurospora appear to form the A/U rich poly(A) signal that can be recognized by transcription termination machinery. Our analyses of poly(A)-seq results in mouse cells, which also has a C/G-biased codon usage, suggest that codon usage bias also play a role in suppressing premature transcription termination within ORFs. Therefore, codon usage bias is a conserved mechanism for C/G-biased genomes to promote gene expression by suppressing PTT. Consistent with this conclusion, it has been previously shown that codon optimization is required for high-level expression of heterologous genes in Aspergillus oryzae, probably by preventing premature transcription termination in an exon (Tokuoka et al., 2008).

Although a similar mechanism as that of 3’UTR is employed, non-canonical poly(A) signals are used for premature cleavage and polyadenylation within coding regions. First, the nucleotide profile of ORF-PAS is different from that of 3’ UTR-PAS. In Neurospora, the downstream U-rich region is missing (Figure 4E) whereas the upstream U-rich region and the downstream GU-rich region largely disappeared in mouse (Figure 7A). Second, compared to the PAS motifs in 3’ UTR, the non-canonical poly(A) signals within ORFs are less efficient to terminate transcription, resulting in the low ratio of ORF to 3’ UTR termination events observed both in Neurospora and mouse (Figure 5A and B and Figure 7C and D). The use of non-canonical poly(A) signals may also affect the fate of prematurely terminated transcripts. It has been shown that prematurely terminated transcripts are rapidly degraded in the cytosol and in the nucleus (van Hoof et al., 2002; Frischmeyer et al., 2002; Doma and Parker, 2007; Vanacova and Stefl, 2007). In Neurospora and mouse, the ratio of true PAS motifs in coding regions is much lower than that in the 3’ UTR, suggesting that PCPA in coding region is mostly suppressed by optimal codons surrounding PAS signal. Even though the non-canonical poly(A) signals are less efficient, a gene with poor codon usage can have multiple such sites, which can have a major impact on mRNA levels.

By analyzing the mRNA termination sites in S. pombe, which has a strong A/U-biased codon usage, we also identified many premature transcription termination events in ORFs. In contrast to Neurospora, there is little or no correlation between codon usage and premature transcription termination events in S. pombe, suggesting that codon usage does not contribute significantly to premature transcription in organisms with A/U-biased genomes. Comparison of the nucleotide composition around transcription termination sites in 3’ UTRs and in ORFs in S. pombe showed that they share a U-rich motif downstream of the cleavage sites, which has been shown to be important for transcription termination (Dichtl and Keller, 2001; Graber et al., 1999; van Helden et al., 2000). Such a U-rich element is largely missing in Neurospora. These results suggest that both C/G- and A/U-biased genomes adapt with transcription termination mechanisms to use different mechanisms to prevent premature transcription termination. The C/G-biased organisms such as Neurospora use C/G-biased codons to prevent the formation of poly(A) signals, thus suppressing premature transcription termination, whereas the A/U-biased organisms relies on more stringent sequence requirements for poly(A) signals.

Materials and methods

Key resources table
Reagent type (species) or resourceDesignationSource or referenceIdentifiersAdditional information
gene (Neurospora crassa)frequencey (frq)NANCBI Gene ID:3876095
gene (Neurospora crassa)NCU09435NANCBI Gene ID:3874734
gene (Neurospora crassa)NCU00931NANCBI Gene ID:3880910
strain (Neurospora crassa)4200PMID:155773Strain maintained in Yi Liu's lab
strain (Neurospora crassa)303–3 (bd, frq10, his-3)PMID:8052643
strain (Neurospora crassa)301–6 (bd, his-3, A)PMCID: PMC180927
antibodyanti-FRQPMID:9150146Rabbit polyclonal; 1:50 for western blot
antibodyanti-WC-2PMID: 11226160Rabbit polyclonal; 1:500 for ChIP
antibodyAnti-RNA polymerase II CTD repeat YSPTSPS (phospho S2) antibodyabcamab5095Rabbit polyclonal; 1:500 for ChIP
antibodyAnti-H3K9me3Active Motifcatalog no:39161Rabbit polyclonal; 1:500 for ChIP
recombinant DNA reagentpKAJ120PMID:8052643deoptimized frq gene; seeFigure 1—figure supplement 1
recombinant DNA reagentfrq-deopt1this paperdeoptimized frq gene; seeFigure 1—figure supplement 1
recombinant DNA reagentfrq-deopt2this paperdeoptimized frq gene; see Materials and methods
recombinant DNA reagentfrq-deopt3this paperdeoptimized frq gene; see Materials and methods
recombinant DNA reagentfrq-deopt4this paperdeoptimized frq gene; see Materials and methods
recombinant DNA reagentfrq-deopt5this paperdeoptimized frq gene; see Materials and methods
recombinant DNA reagentfrq-deopt6this paperdeoptimized frq gene; see Materials and methods
recombinant DNA reagentfrq-deopt7this paperdeoptimized frq gene; see Materials and methods
recombinant DNA reagentfrq-deopt4*this paperdeoptimized frq gene; see Materials and methods
recombinant DNA reagentgfp-NCU09435-wtthis paperwild-type NCU09435 gene in frame with gfp
recombinant DNA reagentgfp-NCU09435-optthis paperoptimized NCU09435 gene; see Figure 5—figure supplement 1
recombinant DNA reagentgfp-NCU02034-wtthis paperwild-type NCU02034 gene in frame with gfp
commercial assay or kitSuperScript III Reverse TranscriptaseThermo Fisher (Waltham, MA )catalog no:18080093For 3' RACE and making 2P-seq library
commercial assay or kitTURBO DNA-free KitThermo Fisher (Waltham, MA )catalog no: AM1907
commercial assay or kitTOPO TA Cloning Kit, Dual promotor for in vitro TranscriptionThermo Fisher (Waltham, MA )catalog no: 452640
commercial assay or kitDirect-zol RNA miniprep plusZymo researchcatalog no: R2072
commercial assay or kitCircLigase II ssDNA LigaseEpicentrecatalog no: CL9021K
software, algorithmTopHathttp://ccb.jhu.edu/software/tophat/index.shtmlRRID:SCR_013035
software, algorithmsamtoolshttp://samtools.sourceforge.net/RRID:SCR_002105
software, algorithmBEDToolshttp://bedtools.readthedocs.io/en/latest/RRID:SCR_006646
software, algorithmcodonWhttp://codonw.sourceforge.net/
software, algorithmSource codethis paperscripts to analyze 2P-seq and 3’READS. Including eight steps: read processing, mapping, filtering and downstream analyses that create plot and figures.
software, algorithmraw sequencing datathis paperPRJNA4193202P-seq data, including two repeats from nuclear RNA extracts
software, algorithmraw sequencing dataPMID:27401558GSE75753mouse poly(A)-seq data
software, algorithmraw sequencing dataPMID:26765774GSE72574yeast poly(A)-seq data

Strains and culture conditions

Request a detailed protocol

In this study, FGSC 4200 (a) was used as the wild-type strain for 2P-seq. The 301–6 (bd, his-3, A) and 303–3 (bd, frq10, his-3) strains were the host strains for his-3 targeting constructs. All the strains used in this study are listed in Supplementary file 1.

Culture conditions have been described previously (Aronson et al., 1994b). Neurospora mats were cut into small discs and transferred to flasks with minimal medium (1 × Vogel’s, 2% glucose). After 24 hr, the tissues were harvested. Protein and RNA analyses were performed as previously described (Zhou et al., 2016). For race tube assay, the medium contains 1x Vogel’s, 0.1% glucose, 0.17% arginine, 50 ng/ml biotin, and 1.5% agar. Strains were inoculated and grown in constant light at 25 degrees for 24 hr before being transferred to DD at 25 degrees. Growth fronts were marked every 24 hr. Calculations of period length were performed as described (Garceau et al., 1997).

Codon deoptimization, plasmid constructs, and Neurospora transformation

Request a detailed protocol

frq codon deoptimization was performed for the 5’ end of the ORF (36–489 nt). The nucleotide sequences of the deoptimized frq are shown in Figure 1—figure supplement 1. Sequences surrounding an alternative frq splice site in this region were not mutated. Codons were deoptimized based on the N. crassa codon usage frequency. In the frq-deopt1 construct, 65 codons were deoptimized, whereas 94 codons were changed in the frq-deopt2 construct. The deoptimized regions of frq were synthesized (Genscript) and inserted into SphI and AflII sites of pKAJ120 to generate frq-deopt1 and frq-deopt2. A homologous recombination-based cloning method (In-Fusion HD cloning kit, Clontech) was used to generate the frq-deopt3/4/5 constructs using frq-deopt2 as a template. In frq-deopt6, the 30th codon (GAC) of wild-type frq was mutated to GAT by site-directed mutagenesis. The 28th codon (TCC) of the frq ORF was mutated to AGT in frq-deopt7. AGT and GAT in the frq-deopt4 construct were mutated to TCC and GAC, respectively, to make the frq-deopt4* construct.

The qa-2 promoter inserted into pBM61 at NotI and XbaI sites to generate the pBM61.qa-2 construct. The gfp ORF was inserted into pBM61.qa-2 at XbaI and BamHI sites to generate the pBM61.qa-2-gfp construct. The wild-type NCU09435 ORF was inserted into pBM61.qa-2-gfp at BamHI and HindIII sites to generate pBM61.qa-2-gfp-NCU09435-wt. The optimized region of NCU09435 was synthesized (GenScript) and used to replace the corresponding region of the pBM61.qa-2-gfp-NCU09435-wt to generate pBM61.qa-2-gfp-NCU09435-opt. Wild-type NCU02034 ORF was inserted into pBM61.qa-2-gfp at BamHI and ApaI sites to generate pBM61. qa-2-gfp-NCU02034-wt. The deoptimized region of NCU02034 was synthesized (GenScript) and used to replace the corresponding region of the pBM61.qa-2-gfp-NCU02034-wt to create pBM61.qa-2-gfp-NCU02034-deopt. All resulting Neurospora expression constructs were transformed into the host strains by electroporation as described previously (Bell-Pedersen et al., 1996). Homokaryotic transformants were obtained by microconidia purification and confirmed by PCR.

Protein analyses

Request a detailed protocol

Tissue harvest, protein extraction, and western blot analyses were performed as previously described (Garceau et al., 1997; Zhou et al., 2012). For western blots, equal amounts of total protein (50 μg) were loaded in each lane. After electrophoresis, proteins were transferred onto PVDF membrane, and western blot analysis was performed. Anti-FRQ antibody, which was generated by using the full-length FRQ protein as antigen (Garceau et al., 1997), was used to detect FRQ protein levels in this study. For western blots, densitometry analyses were performed using Image J. To accurately quantify protein levels in different strains, a serial dilution was performed when needed.

RNA, strand-specific qRT-PCR, northern blot, and 3’ RACE

Request a detailed protocol

RNA extraction, strand-specific qRT-PCR, and northern blot were performed as previously described (Xue et al., 2014). Total RNA was extracted using Trizol and then purified with 2.5 M LiCl. Nuclear RNAs were isolated as described previously (Zhou et al., 2016). Briefly, nuclei were isolated, and nuclear RNAs were extracted using Direct-zol RNA miniprep plus kit (Zymo Research) according to the manufacturer's instruction.

Northern blot analyses were performed as previously described using [32P] UTP-labeled riboprobes (Xue et al., 2014). Riboprobes were transcribed in vitro from PCR products by T3 or T7 RNA polymerase (Ambion) following the manufacturer’s protocol. The primer sequences used for the template amplification were frq-5’ end (5’-TAATACGACTCACTATAGGG (T7 promoter) GGCAGGGTTACGATTGGATT-3’, 5’-GGGTAGTCGTGTACTTTGTCAG-3’), frq-3’ end (5’-TAATACGACTCACTATAGGG (T7 promoter) CCTTCGTTGGATATCCATCATG-3’, 5’-GAATTCTTGCAGGGAAGCCGG-3’), qrf-5’ end (5’-AATTAACCCTCACTAAAGGG (T3 promoter) GAATTCTTGCAGGGAAGCCGG-3’, 5’-CCTTCGTTGGATATCCATCATG-3’), gfp (5’-TAATACGACTCACTATAGGG (T7 promoter) GAACTCCAGCAGGACCATGTG-3’, 5’-GAACCGCATCGAGCTGAA-3’), NCU09435 5’ end (5’-TAATACGACTCACTATAGGG (T7 promoter) GAGGCAGCGTTAATGTTTGTG-3’, 5’-GTGTCCAGTCAACTGGTTATCA-3’), and NCU00931 5’ end (5’-TAATACGACTCACTATAGGG (T7 promoter) GGCAAGCGCCTTAAATTCTC-3’, 5’-TACTCCCTGTCTTCAGTTCCT-3’). For northern blots, densitometry analyses were performed using Image J.

Strand-specific qRT-PCR was performed as previously described (Xue et al., 2014). cDNA was obtained by reverse transcription using a SuperScript III First-Strand Synthesis System (Invitrogen) using the manufacturer’s instructions. β-tubulin was used for internal control. The primer sequences for strand-specific RT reactions were frq (5’GCTAGCTTCAGCTAGGCATC (adaptor) CGTTGCCTCCAACTCACGTTTCTT-3’), frq pre-mRNA (5’-GCTAGCTTCAGCTAGGCATC (adaptor) TTGAACGGTAGGGAGGAGGAGAG-3’), and β-tubulin (5’CTCGTTGTCAATGCAGAAGGTC-3’). The RT reaction was performed by mixing the primers of specific gene and β-tubulin. The primer sequences for the qPCR step of RT-qPCR assay were frq (5’-AGCTTCAGCTAGGCATCCGTT-3’, 5’-GCAGTTTGGTTCCGACGTGATG-3’), frq pre-mRNA (5’-AGCTTCAGCTAGGCATCTTGAACG-3’, 5’-ACGGCATCTCATCCATTCTCACCA-3’), and β-tubulin (5’-ATAACTTCGTCTTCGGCCAG-3’, 5’-ACATCGAGAACCTGGTCAAC-3’).

3’ RACE was performed as previously described (Scotto-Lavino et al., 2006). Briefly, 2 μg total RNA was reverse transcribed using primer Qt (5´-CCAGTGAGCAGAGTGACGAGGACTCGAGCTCAAGCTTTTTTTTTTTTTTTTT-3´). The 3' end is amplified using a common primer Qo (5´-CCAGTGAGCAGAGTGACG-3´) and a gene specific primer GSP1 (frq, 5´-CCAACTCAAGTGCGTAAGGA-3´; qrf, 5´-GTCTTTCTCCTCTGCGATGTC-3´). The amplification product was diluted 40 times and was used as a template for the second amplification. Another common primer QI (5´-GAGGACTCGAGCTCAAGC-3´) and an inner gene-specific primer GSP2 (frq, 5´-CATGGCGGATAGTGGGGATAA-3´; qrf, 5´-GGTAAGCCATGTACCTTGATCT-3´) were used for the second round of amplification. The second-round amplification product was cloned into TA cloning constructs (Invitrogen) and sequenced.

2P-seq

View detailed protocol

2P-seq was performed as described before (Spies et al., 2013) with several modifications. First, nuclear RNA was used instead of total RNA. Second, the primer sequence was re-designed (Supplementary file 2), so that multiple libraries could be sequenced in the same lane using Illumina sequencer. Third, a lower concentration of RT primer was used during reverse transcription to reduce internal priming (Scotto-Lavino et al., 2006). Briefly, poly(A) RNA was purified using oligo-dT25 beads (Invitrogen) and eluted directly into 25 μl of RNase T1 buffer. RNase T1 digestion was performed for 20 min at 22°C using 0.5 U RNase T1 (biochemistry grade, Ambion). After this partial RNase T1 digestion, reverse transcription was performed by addition of 1 μl 1 μM RT primer. Single-stranded cDNAs of 200–400 nucleotides were purified on a 6% TBE-urea gel, then circularized by CircLigaseII (Epicentre). Circularized cDNA was amplified by high-fidelity PCR for 10 cycles using barcoded PCR primers, then gel purified and sequenced using the Illumina Read one primer.

Chromatin immunoprecipitation assay

Request a detailed protocol

ChIP assays were performed as described previously with some modifications (Zhou et al., 2013b). The tissues were cultured in 50 mL minimal medium and were fixed by adding 1% formaldehyde for 15 min at room temperature. The tissues were harvested and ground in the liquid nitrogen, and 100–200 μl tissue powder was suspended in 300 μl ChIP lysis buffer. Chromatin was sheared to about 500 bp fragments by sonication. In each reaction, 500 μg of total lysate and 2 μl antibody were added and incubated at 4°C overnight. The antibodies used in this study were: WC-2 (96), pol II S2P (Abcam, ab5095), and H3K9me3 (Active Motif, 39161). The ‘Input’ was 50 μg of total lysate. G-protein coupled beads (25 μl) were added, and samples were incubated for 2 hr. The beads washed at 4°C for 5 min with the following buffers: ChIP lysis buffer, low salt buffer, high salt buffer, LNDET buffer, and twice with TE buffer. We then added 140 μl 10% chelex beads (Sigma) and heated the samples at 96°C for 20 min. After centrifugation, 100 μl supernatant was transferred to a new tube. The Input DNA was de-crosslinked and extracted with phenol. Immunoprecipitated DNA was quantified using real-time quantitative PCR. For S2P and H3K9me3, the results were normalized to Input DNA and presented as Input %. For WC-2 ChIP, the results were further normalized to the internal control β-tubulin, and data are presented as relative WC-2 levels. Each experiment was performed independently three times. Primers used in the qPCR step are listed in Supplementary file 2.

Data analyses

The poly(A)-seq data analyses were carried out with the combination of published tools and customized scripts written in Perl and R. All the scripts were available for download at Github (Dang, 2018; copy archived at https://github.com/elifesciences-publications/poly-A-seq).

2P-seq raw read processing for N. crassa:

Request a detailed protocol

Only raw reads with no fewer than 10 consecutive adenosines were used. Next, these poly(A) sequences and the adaptor sequences were removed, and any sequences less than 20 nt were discarded. Trimmed raw reads were mapped strand specifically to the Neurospora crassa genome (v10) using Tophat software (v2.1.1). The reads with multiple targets identified using SAMtools were discarded (Li et al., 2009). To identify reads due to false priming at internal poly(A) stretches of mRNA, we searched for genomic sequences in the 20 nt downstream of the 3’ end position of mapped reads, and discarded the reads with six consecutive adenosines or ≥7 adenosines in 12-nt sliding windows (Beaudoing et al., 2000; Tian et al., 2005). These data were processed with BEDTools (Quinlan and Hall, 2010) (genomecov) to generate two bedgraph files, which reflect the density of reads at each position at plus or minus strand. The 2P-seq data were deposited in BioProject (accession# PRJNA419320)

Raw read processing for S. pombe and mouse:

Request a detailed protocol

Fission yeast (GSE75753) (Lemay et al., 2016) and mouse (GSE72574) (Yang et al., 2016) poly(A)-seq data were previously generated. The raw read processing was performed as previously described (Jan et al., 2011). We performed the analysis using the following steps: 1) Remove adapter sequences and retain reads with at least one adenosine at the 3’ end. 2) Remove the poly(A) stretch (≥1A), record the length of the poly(A) stretch for each read and discard the reads with a length shorter than 20 nt. 3) Map the processed reads with Tophat (v2.1.1) to the corresponding reference genome (Ensembl EF2 for S. pombe and UCSC mouse mm10, downloaded from iGenomes (Illumina). 4) Remove reads with multiple targets. 5) In order to avoid false positive signals, only the genomic sequences downstream of the 3’ end of mapped reads and retained reads with at least one untemplated A were analyzed. 6) Create bedgraph files.

Poly(A) site or pA site identification in 3’ UTR and ORFs:

Request a detailed protocol

We first clustered sequences within 24 nt of the poly(A) site signals into peaks with BEDTools and recorded the number of reads falling in each peak (command: bedtools merge -s -d 24 c 4 -o count). We only retained those peaks with at least five reads for further analysis. We next determined the summit of each peak (i.e., the position with the highest signal) and took this peak to be the poly(A) site.

We classified the peaks into two different groups: peaks in 3’ UTRs and peaks in ORFs. Because of the likely inaccurate 3’ UTR annotations of genomic reference (i.e., GTF files of respective species), we set the 3’ UTR regions of each gene from the end of the ORF to the annotated 3’ end plus a 1-kbp extension. For a given gene, we analyzed all the peaks within the 3’ UTR region, compared the summits of each peak and selected the position with the highest summit as the major poly(A) site of the gene.

For ORFs, we retained the putative poly(A) sites for which the PAS region fully overlapped with exons that are annotated as ORFs. The range of PAS regions for different species was empirically determined as a region with high AT content around the ORF poly(A) site. For each species, we did the first round of test setting the PAS region from −30 to −10 upstream of the cleavage site, then analyzed AT distributions around the cleavage sites in ORFs to identify the actual PAS region. The final settings for ORF PAS regions of N. crassa and mouse were −30 to −10 nt and those for S. pombe were −25 to −12 nt.

Identification of 6-nucleotide PAS motif:

Request a detailed protocol

We followed the methods as previously described to identify PAS motifs (Spies et al., 2013). Specifically, we focused on the putative PAS regions from either 3’ UTRs or ORFs. (1) We identified the most frequently occurring hexamer within PAS regions. (2) We calculated the dinucleotide frequencies of PAS regions, randomly shuffled the dinucleotides to create 1000 sequences, then counted the occurrence of the hexamer from step 1. (3) We tested the frequency of the hexamer from step one and retain it if its occurrence was ≥2 fold higher than that from random sequences (step 2) and if P-values were <0.05 (binomial probability). (4) We then removed all the PAS sequences containing the hexamer. We repeated steps 1 to 4 until the occurrence of the most common hexamer was <1% in the remaining sequences.

Calculation of the normalized codon usage frequency (NCUF) in PAS regions within ORFs:

Request a detailed protocol

To calculate NCUF for codons and codon pairs, we did the following: For a given gene with poly(A) sites within ORF, we first extracted the nucleotide sequences of PAS regions that matched annotated codons (e.g., 6 codons within −30 to −10 upstream of ORF poly(A) site for N. crassa) and counted all codons and all possible codon pairs. We also randomly selected 10 sequences with the same number of codons from the same ORFs and counted all possible codon and codon pairs. We repeated these steps for all genes with PAS signals in ORFs. We then normalized the frequency of each codon or codon pair from the ORF PAS regions to that from random regions.

Relative synonymous codon adaptiveness (RSCA):

Request a detailed protocol

We first count all codons from all ORFs in a given genome. For a given codon, its RSCA value was calculated by dividing the number a particular codon with the most abundant synonymous codon. Therefore, for synonymous codons coding a given amino acid, the most abundant codons will have RSCA values as 1.

Calculation of codon bias index (CBI):

Request a detailed protocol

The ORF sequences from N. crassa and S. pombe were extracted based on the genomic reference sequences. CBI for each gene was calculated with CodonW software (http://codonw.sourceforge.net/). Since condonW does not support mouse, CBI for each mouse gene is calculated as described (Bennetzen and Hall, 1982): CBI = (Nopt –Nran)/(Ntot-Nran). Nopt, Nran, and Ntot represent the number of optimal, random and total codons in a given gene, respectively. We arbitrarily define optimal and random codons as RSCA ≥0.9 or 0.3 < RSCA < 0.9, whereas codons with RSCA <0.3 are defined as rare codons.

Calculation of codon adaptation index (CAI):

Request a detailed protocol

We calculated CAI values for all predicted protein-encoding genes in N. crassa, S. pombe, and mouse. We first calculated the frequencies of each codon in all annotated ORF sequences. For each amino acid, the relative synonymous codon adaptiveness (RSCA) for each synonymous codon was weighted by the most frequent synonymous codon. For each gene, CAI is defined as the geometric mean of RSCA values of all codons from the ORFs.

Calculation of normalized ORF/3’UTR termination ratio:

Request a detailed protocol

For a given gene, the normalized ORF/3’UTR termination ratio reflects the relative frequencies of premature transcription termination events in ORFs. A higher value means that a gene has a higher chance to terminate transcription in ORF region. The ratio is calculated as below:

Normalized ratio = NORF-pA*1000/ (N3’ UTR-pA *L)

NORF-pA and N3’ UTR-pA stand for the number of reads (or pA events) in ORF and 3’ UTR. L stands for the length of ORF.

Calculation of PAS score:

Request a detailed protocol

The PAS score is calculated as previously described (Tian et al., 2007). Specifically, we extracted all sequences surrounding 3’ UTR poly(A) sites (e.g. −30 ~+10 nt for Neurospora) to generate position-specific scoring matrices (PSSM). Each entry in the PSSM was calculated by Mij = log2 (fij/gi), where fij is the frequency of nucleotide i at position j, and gi is the genomic frequency of nucleotide i. For each PAS sequence, the PAS score is the sum of scores retrieved based on the position of nucleotides. It can be shown as S =∑ jmi, j, where mi, j is the score of nucleotide i at position j in the PSSM.

Identification of sequence and codon context surrounding 6nt PAS motif

Request a detailed protocol

We selected the top 40 6nt PAS motifs in 3’UTR in Neurospora, which account for 58% of 3’ UTR pA events of all expressed genes. For mice, we only selected the first 2 PAS motifs (AAUAAA and AUUAAA), which account for 77% of 3’ UTR pA events. We first determined the genome-wide location of those motifs in a strand-specific manner. Then we classified these motifs into two major groups (true PAS and false PAS) based on whether the downstream region (5-35nt) after the motifs have pA signals at same strand. Each of these two groups is further categorized into two subgroups, based on whether these motifs are fully located inside the coding region (ORF true/false PAS) or at 3’ UTR (UTR true/false PAS). For each group, 80nt DNA sequences flanking the motifs were collected and the A/T contents were calculated. Based on the A/T content difference between ORF true and false PAS group, we extracted codon sequences flanking the PAS motif with length equivalent. For Neurospora, we extract 15 codons at both upstream and downstream of PAS motif. For each group, we first counted the occurrence of each codon and then divided it by the number of all codons from selected regions to obtain the ratio of each codon. For each codon, the relative codon frequency is the base two logarithm (log2) of the normalized ratio by dividing the ratio from ORF-true PAS group with that from ORF-false PAS group.

Data availability

The following data sets were generated
    1. Zhou Z
    (2017) PolyA-seq
    Publicly available at the NCBI BioProject database (Accession no: PRJNA419320).

References

    1. Akashi H
    (1994)
    Synonymous codon usage in Drosophila melanogaster: natural selection and translational accuracy
    Genetics 136:927–935.
    1. Bennetzen JL
    2. Hall BD
    (1982)
    Codon selection in yeast
    The Journal of Biological Chemistry 257:3026–3031.
    1. Roth A
    2. Anisimova M
    3. Cannarozzi GM
    (2012)
    Codon Evolution: Mechanisms and Models
    189–217, Measuring codon usage bias, Codon Evolution: Mechanisms and Models, New York, Oxford University Press Inc.

Decision letter

  1. Torben Heick Jensen
    Reviewing Editor; Aarhus University, Denmark

In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.

Thank you for submitting your article "Codon usage biases suppress premature transcription termination to promote gene expression" for consideration by eLife. Your article has been favorably evaluated by Patricia Wittkopp (Senior Editor) and three reviewers, one of whom is a member of our Board of Reviewing Editors. The following individual involved in review of your submission has agreed to reveal her identity: Judith B Zaugg (Reviewer #3).

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

The presented data are of good quality and the observations are interesting. Yet, concerns are raised, of both analytical and interpretative character that should be addressed before publication in eLife can be considered. The major points below are essential to consider and we strongly suggest that the authors also improve the manuscript based on the additional considerations listed in the individual reviews.

Major points:

1) The title of the article is "Codon usage biases suppress premature transcription termination to promote gene expression". As formulated, it implies that there is actually an active mechanism that "senses" rare codons and feeds back this information to the transcription termination machinery. This would be very intriguing, but it appears that the most likely explanation is that codon bias on the one hand and transcription termination signal on the other hand co-evolved to adapt to a G/C rich genome. In other words, there is no direct cause and effect relationship between codon bias and transcription termination. Despite this likely explanation, the authors, in many instances, strongly indicate that there is a direct mechanism that couples the two phenomena. For example: "These results suggest that codon usage plays an important role in mediating premature transcription termination[…]"; "[…]suggesting that codon usage bias also plays a role in premature transcription termination in mouse"; "[…]codon usage biases are a conserved mechanism that affects premature transcription termination[…]"; "[…]codon usage bias is a conserved mechanism[…]".

These formulations are way too strong. Codon bias and transcription termination signals are correlated but it is not a "mechanism". Thus, these sentences, as well as the title, are misleading (obviously provocative, but also simply wrong) and the title should be changed and the text re-written accordingly.

2) A huge body of literature exists on premature transcription termination, which is ignored in the present manuscript. The findings should be discussed in the context of the relevant literature on (1) the link between codon bias and termination in S. cerevisiae and (2) premature termination at – typically intronic – premature cleavage and polyadenylation (PCPA) sites in mammals.

3) Since these rare codons can make up a canonical polyA site, more analyses on codon pairs would be welcome, and if it holds true that premature termination is not simply due to the formation of a PAS by two rare codons, it would be great if the authors could speculate on any other mechanism, e.g. by genome-wide analysis of ChIP-seq data. If it is not only the formation of a PAS (based on pair-wise analysis), what are the authors proposing as a mechanism? Is there any ChIP-seq data or similar available in Neurospora that shows an enrichment over the rare codons? For sure there is lots of ChIP-seq data available in mouse where the authors could look for factors potentially enriched at rare codons. Or does this coincide with PolII pausing? Moreover, the authors have excluded an effect on chromatin based on looking at one mark in one locus. Is there any genome-wide data available that the authors could use for checking this statement across the naturally occurring rare codons? For sure in mouse there would be enough data for that.

Reviewer #1:

This paper by Zhou et al. reports evidence for a role of codon usage biases affecting premature transcription termination in Neurospora crassa and Mus musculus. This was discovered through codon deoptimisation of the Neurospora frq gene, which leads to the loss of protein the full-length mRNA. The authors further show that this phenotype is due to premature transcription termination that generates short polyadenylated transcripts. Through generation of other mutants of the frq gene, Zhou et al. determined this mechanism to depend on multiple sequence elements that they further describe to be low-efficiency non-canonical transcription termination sites, based on genome wide analysis of endogenous 3'UTR and ORF polyadenylation sites. Characterisation of these polyadenylation sites revealed a genome wide correlation between codon usage and premature transcription termination. Finally, a similar genome wide analysis in Saccharomyces pombe and Mus musculus provided information suggesting that such a mechanism is not conserved in A/U-rich organism as S. pombe but might be present in G/C-rich ones as well (M. musculus).

Overall, the data shown are convincing and provide a conceptually new, though possibly not very surprising, link between codon bias and polyadenylation-dependent termination. The major caveat is that the findings are not discussed in the context of the relevant literature on (1) the link between codon bias and termination in S. cerevisiae and (2) premature termination at – typically intronic – premature cleavage and polyadenylation (PCPA) sites in mammals.

Additional points:

- (regarding deopt3): "suggesting that these products may be rapidly degraded" and "Because premature transcription termination products are usually rapidly degraded in the cytosol (van Hoof et al., 2002; Frischmeyer et al., 2002)". This point is probably of general relevance and should be discussed more explicitly. That is, the mostly low abundance of ORF PAS usage may well be due to the unstable nature of prematurely terminated transcripts. It is also worth noting that the prematurely terminated transcripts could well be subjected to nuclear decay as implicated in decay of human PCPA terminated RNAs, in addition to the non-stop decay suggested by the authors. This should be referenced together with a more general discussion of yeast and human premature termination systems.

- "The ratios were less than 10% for 95% of the genes with ORF-pA, indicating that only a small portion of transcription terminated in ORFs." The authors should discuss this more carefully. The RNA 3' ends do not mark transcription termination as RNA polymerase is present well beyond the PAS, as also found by the authors themselves. In addition, the low levels of ORF-pA may be partially due to the unstable nature of prematurely terminated transcripts.

- Figure 2E: Is this understood correct that the authors claim that the antisense gene is also prematurely terminated when the sense frq is codon deoptimized? This warrants further explanation. Where does the premature termination occur and do the mutation create PAS on the antisense? I wonder whether this is some kind of artefact. If no plausible explanation for the premature termination of the antisense gene can be found also this should be mentioned.

- The authors find that ORF PAS are generally "weaker". Could this be a technical issue since those sites are generally also less abundant and hence less well defined? On a different note, the experimentally found pA events may be derived from alternative PAS-independent events. This could be discussed.

- In the case of Mus musculus data, it may be interesting to further compare to PAS motifs of intronic PCPA sites.

- Figure 5: The authors show that the AAUAAA element in NCU02034 is not used due to optimal codons in the vicinity. This may allow for a supporting bioinformatic analysis where the codon optimality surrounding all ORF AAUAAA (or perhaps the AAUGAA) motifs is compared for genes, which use those sites, compared to genes where they are not used.

- Figure 1C: despite the confirmation of the absence of protein by the absence of mRNA (Figure 1D and E), some information regarding the anti-FRQ antibody would be needed (these do not seem to be provided either in the previous references). Is this antibody homemade? Which part of the protein does it target?

- Figure 3D: a proper Western blot quantification would require a serial dilution to estimate the protein concentration range in which the signal is linear (such a dilution do not require to be shown but should be mentioned in the Materials and methods).

- "other cis-elements such as U-rich upstream auxiliary element and U-rich downstream element also play important roles. In plants and yeast, the downstream element can be replaced by a U-rich element." These sentences are confusing and should be rephrased.

- Potential typo: "-30 ~ -10".

- Discussion concerning the previous findings that codon usage affects chromatin structure and transcription initiation. For the reader it would be interesting to relate those earlier findings in the context of the premature termination. That is, is it possible that changes in chromatin structure and transcription levels are indirectly due to premature termination and vice versa?

Reviewer #2:

In this manuscript, the authors report interesting observations. Neurospora crassa exhibits a strong GC rich sequence bias, which translates in a strong codon bias with synonymous codons with G or C at the wobble position being preferred. When changing codons of the frq gene to non-optimal codons, which are thus overall more A/T rich, they observed a complete disappearance of not only the expressed FRQ protein but also of the full-length mRNA.

They further show that it results from a different mechanism than the one described by the same group (Zhou et al., 2016), where they showed that a poor codon bias in N. crassa results in a poor transcription efficiency of the gene (and not a decreased stability as one would have expected). Here they show that the effect results from a premature termination event with a short transcript accumulating in place of the mRNA.

Performing the genome-wide mapping of mRNA 3'-ends, they identify a number of poly(A) sites (PAS) within ORFs and show that the occurrence of these are strongly correlated with a poor codon bias, genes enriched with rare codons being much more prone to exhibit transcription termination within ORFs. They show that this is not true for the A/T rich S. pombe but also true, yet to a much lower extent, for the mouse, which also exhibits a G/C enriched genome.

They conclude, "[…]codon biases are conserved mechanisms that affects premature transcription termination events in C/G-biased organisms[…]".

The presented data are of quality and the observations undoubtedly interesting.

Yet, I have some strong concerns that, I think, should be addressed before considering publication in eLife.

My most important concern should be relatively easy to address. It might seem to simply be a question of semantics, but I think it goes much more beyond than that. The title of the article is "Codon usage biases suppress premature transcription termination to promote gene expression". As formulated, it implies that there is actually an active mechanism that "senses" rare codons and feeds back this information to the transcription termination machinery. This would have been most intriguing and I was very curious to see what kind of mechanism the authors could propose for such a mechanism. But, in the core of the text, you understand that the most likely explanation is that codon bias on the one hand and transcription termination signal on the other hand coevolved to adapt to the G/C rich genome. In other words, there are no direct cause and effect relationship between codon bias and transcription termination, simply that both phenomenon, which are obviously linked to the same DNA sequence, co-evolved with it such as you could not change one without influencing the other. This is indeed the most likely explanation for these observations. Yet, in many instances, the way the text is written strongly implies that there is a direct mechanism that couples the two.

For examples: "These results suggest that codon usage plays an important role in mediating premature transcription termination[…]"; "[…]suggesting that codon usage bias also plays a role in premature transcription termination in mouse"; "[…]codon usage biases are a conserved mechanism that affects premature transcription termination[…]"; "[…]codon usage bias is a conserved mechanism[…]".

I strongly disagree with these formulations. Both, codon bias and transcription termination signals are correlated but, no, codon bias is not a "mechanism". These sentences, as well as the title, are very misleading (obviously provocative, but simply wrong) and the text should be re-written accordingly.

Reviewer #3:

The authors present evidence that shows that specific codons can trigger premature transcription termination in the organism Neurospora. They further analyse the sequence content of the 3'UTR polyA sites (PAS) and intragentic, what they call ORF-PAS and find slight differences in the PAS signals. They further show a strong anticorrelation of codon adaptation index and codon bias index with the ratio of ORF/3' termination. They do not find a similar mechanism in other fungi (S. pombe) yet in the mouse genome a similar albeit less significant correlation between codon usage and intragenic stop is observed. Another important observation was that the most frequent codon-pair was made up of two of the rarest codons and in fact resembles a canonical poly-A site.

Overall, the authors provide good evidence for a role in codon usage in transcription termination. However since these rare codons can make up a canonical polyA site, I would welcome some more analyses on codon pairs, and if it holds true that the mechanism of premature termination is not simply the formation of a PAS by two rare codons, it would be great if the authors speculate on the mechanism, e.g. by genome-wide analysis of ChIP-seq data (see comments below).

- It looks like the deoptimized codons to some extent act as stop codon, but not always. Also, when looking at pairs it seems it is easy to create a PAS, are the deoptimized codons introduced into the frq gene forming pairs that make up PAS?

- Does the premature stop in NCU09435 and NCU00931 (Figure 4A) happen at non-optimal codons? Similarly, I'm missing a global analysis that counts the number of natural premature stop events per codon (rather than a correlation of indexes with ratios).

- Negative correlation of CAI and CBI with ratio of ORF-PAS vs. 3'PAS: what about the codon, or even codon-pair at the exact location of the premature stop? Are these predominantly non-optimal codons? Or is the mechanism rather additive and PolII needs to encounter a couple of non-optimal codons to stop?

- Figure 5E: would be easier to read if it was on the same axis. Also the red line for K is somehow shifted. How does this correlation look for all AA within the same plot? Is it still visible or is it important to split them by AA?

- Is it just the formation of PAS that makes these rarer codons affect premature stop? I would welcome an analysis that looks at rare codons that do form a canonical PAS vs. those that don't and compare their effect on premature transcription stop. If it is mainly the formation of PAS that drives the termination I think the authors should be more explicit about this and say the mechanism of how rare codons affect premature termination is through forming non-canonical PAS.

- If it is not only the formation of a PAS (based on pair-wise analysis), what are the authors proposing as a mechanism? Is there any ChIP-seq data or similar available in Neurospora that shows an enrichment over the rare codons? For sure there is lots of ChIP-seq data available in mouse where the authors could look for factors potentially enriched at rare codons. Or does this coincide with PolII pausing?

- The authors have excluded an effect on chromatin based on looking at one mark in one locus. Is there any genome-wide data available that the authors could use for checking this statement across the naturally occurring rare codons? For sure in mouse there would be enough data for that.

[Editors' note: further revisions were requested prior to acceptance, as described below.]

Thank you for resubmitting your work entitled "Codon usage biases co-evolve with transcription termination machinery to suppress premature cleavage and polyadenylation" for further consideration at eLife. Your revised article has been favorably evaluated by Patricia Wittkopp (Senior Editor) and a Reviewing Editor.

The manuscript has been improved but there are some remaining issues that need to be addressed before acceptance, as outlined below:

You do not appear to have taken the S. cerevisiae literature on transcription termination into consideration when revising the manuscript. This will have to be done before final acceptance of the manuscript. If you do not deem this concern appropriate, then please provide an argument to this effect.

[Editors' note: further revisions were requested prior to acceptance, as described below.]

Thank you for resubmitting your work entitled "Codon usage biases co-evolve with transcription termination machinery to suppress premature cleavage and polyadenylation" for further consideration at eLife. Your revised article has been favorably evaluated by Patricia Wittkopp (Senior Editor) and a Reviewing Editor.

The manuscript has been improved but there are some remaining issues that need to be addressed before acceptance, as outlined below:

There appears to be a misunderstanding concerning the most recent editorial comment: "You do not appear to have taken the S. cerevisiae literature on transcription termination into consideration when revising the manuscript. This will have to be done before final acceptance of the manuscript. If you do not deem this concern appropriate, then please provide an argument to this effect.".

The referees of the original submission requested a referencing of the widely accepted link between codon bias and premature transcription termination by the Nrd1-Nab3-Sen1 dependent pathway in S. cerevisiae. Even though this is poly(A) site independent, conceptually this phenomenon is similar to the one described in the manuscript and hence deserves mentioning. The key reference is doi: 10.1093/nar/gkw683. Hence, either mention this literature or provide an argument why this is not relevant. Previously added references concerning S. cerevisiae polyadenylation do not appear relevant and may be deleted again.

https://doi.org/10.7554/eLife.33569.024

Author response

[…] Major points:

1) The title of the article is "Codon usage biases suppress premature transcription termination to promote gene expression". As formulated, it implies that there is actually an active mechanism that "senses" rare codons and feeds back this information to the transcription termination machinery. This would be very intriguing, but it appears that the most likely explanation is that codon bias on the one hand and transcription termination signal on the other hand co-evolved to adapt to a G/C rich genome. In other words, there is no direct cause and effect relationship between codon bias and transcription termination. Despite this likely explanation, the authors, in many instances, strongly indicate that there is a direct mechanism that couples the two phenomena. For example: "These results suggest that codon usage plays an important role in mediating premature transcription termination…"; "…suggesting that codon usage bias also plays a role in premature transcription termination in mouse"; "…codon usage biases are a conserved mechanism that affects premature transcription termination…"; "…codon usage bias is a conserved mechanism…".

These formulations are way too strong. Codon bias and transcription termination signals are correlated but it is not a "mechanism". Thus, these sentences, as well as the title, are misleading (obviously provocative, but also simply wrong) and the title should be changed and the text re-written accordingly.

We agreed with this reviewer that codon usage, by affecting the formation of poly(A) signals, affects premature cleavage and polyadenylation. As suggested, we modified the title and the text in the revised manuscript.

2) A huge body of literature exists on premature transcription termination, which is ignored in the present manuscript. The findings should be discussed in the context of the relevant literature on (1) the link between codon bias and termination in S. cerevisiae and (2) premature termination at – typically intronic – premature cleavage and polyadenylation (PCPA) sites in mammals.

As suggested, we have now added more description and citations of previous studies on premature transcription termination in the revised introduction and discussion, especially on premature transcription termination in yeast and PCPA in introns of mammalian cells.

3) Since these rare codons can make up a canonical polyA site, more analyses on codon pairs would be welcome, and if it holds true that premature termination is not simply due to the formation of a PAS by two rare codons, it would be great if the authors could speculate on any other mechanism, e.g. by genome-wide analysis of ChIP-seq data. If it is not only the formation of a PAS (based on pair-wise analysis), what are the authors proposing as a mechanism? Is there any ChIP-seq data or similar available in Neurospora that shows an enrichment over the rare codons? For sure there is lots of ChIP-seq data available in mouse where the authors could look for factors potentially enriched at rare codons. Or does this coincide with PolII pausing? Moreover, the authors have excluded an effect on chromatin based on looking at one mark in one locus. Is there any genome-wide data available that the authors could use for checking this statement across the naturally occurring rare codons? For sure in mouse there would be enough data for that.

From our results and previous studies, it is clear that premature termination is not simply due to the formation of a single PAS motif. Other cis-elements surrounding the PAS motif are also very important for PCPA. Using the frq gene as an example in Neurospora, we showed that a cluster of rare synonymous codons can potentially form the A/U rich poly(A) signal, including PAS motif and other surrounding cis-elements, which could be recognized by transcription termination machinery. On the other hand, PCPA didn’t occur in the wild-type NCU02034 gene even though a canonical PAS motif was present in the ORF (Figure 5—figure supplement 1B). We showed that by introducing rare codons surrounding the PAS signal, transcription was fully terminated in the coding region (Figure 5D).

To further address this issue, we performed genome-wide analyses in Neurospora and mouse by identifying “true PAS” and “false PAS” motifs based on the presence of poly(A) sites downstream of the PAS motifs and determined the nucleotide composition surrounding the groups of PAS motifs (Figure 5—figure supplement 2 and Figure 7—figure supplement 2). Our results clearly showed that in the ORFs, single PAS motif cannot trigger PCPA. In addition, the AU contents surrounding true PAS motifs are higher than in false PAS motifs, indicating that surrounding cis-elements are also required for triggering PCPA. These results are consistent with our results that the regions surrounding true PAS motifs are enriched for rare codons. Therefore, we propose that rare codons can potentially promote PCPA by the formation of PAS motif and its surrounding cis-elements.

As for the potential effect of chromatin on PCPA, we did the following analysis. We showed that the H3K9me3 levels at frq promoter were comparable in the wt-frq and frq-deopt2 strains (Figure 2—figure supplement 1B), indicating that the loss of full-length frq mRNA in the frq-deopt2 strains is not due to H3K9me3-mediated transcriptional silencing. We have also done polII, H3K4me3, and H3K36me3 ChIP-seq previously in the lab and we found that polII, H3K4me3, and H3K36me3 enrichment positively correlates with codon usage and mRNA levels in Neurospora. We think this is mainly due to the effect of codon usage on transcription, which we are currently investigating to understand the mechanism.

As for the suggested ChIP-seq analysis to look for factors potentially enriched at rare codons, we attempted but we felt the results could not be easily interpreted because the resolution of ChIP-seq results (typically ~200 bp) does not have a codon-level resolution.

Reviewer #1:

[…] Overall, the data shown are convincing and provide a conceptually new, though possibly not very surprising, link between codon bias and polyadenylation-dependent termination. The major caveat is that the findings are not discussed in the context of the relevant literature on (1) the link between codon bias and termination in S. cerevisiae and (2) premature termination at – typically intronic – premature cleavage and polyadenylation (PCPA) sites in mammals.

We appreciated the positive comments by the reviewers. As suggested, we have now added more description and citations of previous studies on premature transcription termination in the revised introduction and discussion, especially on premature transcription termination in yeast and PCPA in introns of mammalian cells.

Additional points:

- (regarding deopt3): "suggesting that these products may be rapidly degraded" and "Because premature transcription termination products are usually rapidly degraded in the cytosol (van Hoof et al., 2002; Frischmeyer et al., 2002)". This point is probably of general relevance and should be discussed more explicitly. That is, the mostly low abundance of ORF PAS usage may well be due to the unstable nature of prematurely terminated transcripts. It is also worth noting that the prematurely terminated transcripts could well be subjected to nuclear decay as implicated in decay of human PCPA terminated RNAs, in addition to the non-stop decay suggested by the authors. This should be referenced together with a more general discussion of yeast and human premature termination systems.

We agreed that prematurely terminated transcripts may be degraded in the nucleus in addition to the non-stop decay in the cytosol. Studies on premature transcription termination in yeast and human are now cited and described in the revised paper.

- "The ratios were less than 10% for 95% of the genes with ORF-pA, indicating that only a small portion of transcription terminated in ORFs." The authors should discuss this more carefully. The RNA 3' ends do not mark transcription termination as RNA polymerase is present well beyond the PAS, as also found by the authors themselves. In addition, the low levels of ORF-pA may be partially due to the unstable nature of prematurely terminated transcripts.

We agreed that prematurely terminated transcripts may be inherently unstable.

This sentence is revised as: “The ratios were less than 10% for 95% of the genes with ORF-pA, which may be due to that these non-canonical poly(A) signals within ORFs are less efficient in promoting premature cleavage and polyadenylation (Berg et al., 2012; Guo et al., 2011) or that the prematurely terminated RNAs are unstable (van Hoof et al., 2002; Frischmeyer et al., 2002; Doma and Parker, 2007; Vanacova and Stefl, 2007).”

- Figure 2E: Is this understood correct that the authors claim that the antisense gene is also prematurely terminated when the sense frq is codon deoptimized? This warrants further explanation. Where does the premature termination occur and do the mutation create PAS on the antisense? I wonder whether this is some kind of artefact. If no plausible explanation for the premature termination of the antisense gene can be found also this should be mentioned.

The antisense transcript qrf is also prematurely terminated when frq is de-optimized as shown in Figure 2E and Figure 2—figure supplement 1F. As suggested, the poly(A) sites of qrf in the frq-deopt1 strain were determined by 3’ RACE as shown in Figure 2—figure supplement 1G. It should be noticed that a PAS motif already exists in the wt-frq gene but does not lead to PCPA until surrounding codons were deoptimized. This is consistent with our finding that other surrounding cis-elements are also required for efficient premature termination.

- The authors find that ORF PAS are generally "weaker". Could this be a technical issue since those sites are generally also less abundant and hence less well defined? On a different note, the experimentally found pA events may be derived from alternative PAS-independent events. This could be discussed.

The reviewer is right that prematurely terminated transcripts are less abundant which make them more difficult to determine. Because of this reason, we determined the poly(A) sites using nuclear RNA.

We agreed that the experimentally found pA events might be derived from alternative PAS-independent events. As suggested, this is now discussed in the revised manuscript when we described the identification of poly(A) sites.

- In the case of Mus musculus data, it may be interesting to further compare to PAS motifs of intronic PCPA sites.

As suggested, the nucleotide profile and PAS motif of intronic PCPA sites are now described and are shown in Figure 7—figure supplement 1.

- Figure 5: The authors show that the AAUAAA element in NCU02034 is not used due to optimal codons in the vicinity. This may allow for a supporting bioinformatic analysis where the codon optimality surrounding all ORF AAUAAA (or perhaps the AAUGAA) motifs is compared for genes, which use those sites, compared to genes where they are not used.

Thank you for your suggestion. As suggested, we did the following analyses.

In the ORF and 3’ UTR of all Neurospora genes, we search for all the sequence contexts that could consist of a putative PAS motif (the top 40 most frequently used PAS motifs in Neurospora and the top2 in mouse), and divided them into two groups: “true PAS”, which have at least one pA site within the 5-35nt downstream region, and “false PAS”, which don’t have pA site. We compared the nucleotide composition between these two groups and found out that the AU contents of the true PASs were higher than that in the false PASs, including two U-rich elements and two A-rich motifs flanking PAS in Neurospora (Figure 5—figure supplement 2). Similar results were observed in mouse (Figure 7—figure supplement 2).

In addition, we found that more rare codons are used in the regions surrounding true PAS than that in the false PAS. Therefore, we conclude that the cis-elements surrounding PAS motifs are important for PCPA in coding regions.

- Figure 1C: despite the confirmation of the absence of protein by the absence of mRNA (Figure 1D and E), some information regarding the anti-FRQ antibody would be needed (these do not seem to be provided either in the previous references). Is this antibody homemade? Which part of the protein does it target?

The FRQ antibody was generated by using the full-length FRQ protein as antigen and the paper describing the method is now cited in the revised manuscript.

- Figure 3D: a proper Western blot quantification would require a serial dilution to estimate the protein concentration range in which the signal is linear (such a dilution do not require to be shown but should be mentioned in the Materials and methods).

Western blot quantification method was rewritten in the method.

- "other cis-elements such as U-rich upstream auxiliary element and U-rich downstream element also play important roles. In plants and yeast, the downstream element can be replaced by a U-rich element." These sentences are confusing and should be rephrased.

As suggested, we revised these sentences.

- Potential typo: "-30 ~ -10".

Thank you, the typo has been corrected.

- Discussion concerning the previous findings that codon usage affects chromatin structure and transcription initiation. For the reader it would be interesting to relate those earlier findings in the context of the premature termination. That is, is it possible that changes in chromatin structure and transcription levels are indirectly due to premature termination and vice versa?

We have done H3K4me3 and H3K36me3 ChIP-seq and we found that H3K4me3 and H3K36me3 are positively correlated both with CBI and mRNA levels in Neurospora. We think this is mainly due to the effect of codon usage on transcription, which we are currently working on to understand the underlying mechanisms.

It is possible that PCPA in coding region is affected by transcription level but we currently have no clear evidence to support this.

Reviewer #2:

[…] My most important concern should be relatively easy to address. It might seem to simply be a question of semantics, but I think it goes much more beyond than that. The title of the article is "Codon usage biases suppress premature transcription termination to promote gene expression". As formulated, it implies that there is actually an active mechanism that "senses" rare codons and feeds back this information to the transcription termination machinery. This would have been most intriguing and I was very curious to see what kind of mechanism the authors could propose for such a mechanism. But, in the core of the text, you understand that the most likely explanation is that codon bias on the one hand and transcription termination signal on the other hand coevolved to adapt to the G/C rich genome. In other words, there are no direct cause and effect relationship between codon bias and transcription termination, simply that both phenomenon, which are obviously linked to the same DNA sequence, co-evolved with it such as you could not change one without influencing the other. This is indeed the most likely explanation for these observations. Yet, in many instances, the way the text is written strongly implies that there is a direct mechanism that couples the two.

For examples: "These results suggest that codon usage plays an important role in mediating premature transcription termination[…]"; "[…]suggesting that codon usage bias also plays a role in premature transcription termination in mouse"; "[…]codon usage biases are a conserved mechanism that affects premature transcription termination[…]"; "[…]codon usage bias is a conserved mechanism[…]".

I strongly disagree with these formulations. Both, codon bias and transcription termination signals are correlated but, no, codon bias is not a "mechanism". These sentences, as well as the title, are very misleading (obviously provocative, but simply wrong) and the text should be re-written accordingly.

Thanks for your suggestion. We agreed that codon usage, by affecting the formation of poly(A) signals, affects premature cleavage and polyadenylation. As suggested, we revised the title, the sentences listed here, and other sentences in the paper.

Reviewer #3:

[…] Overall, the authors provide good evidence for a role in codon usage in transcription termination. However since these rare codons can make up a canonical polyA site, I would welcome some more analyses on codon pairs, and if it holds true that the mechanism of premature termination is not simply the formation of a PAS by two rare codons, it would be great if the authors speculate on the mechanism, e.g. by genome-wide analysis of ChIP-seq data (see comments below).

As suggested, more analysis on codon pair was performed. We found that premature transcription termination is not simply due to the formation of a PAS motif, other cis-elements are also very important for PCPA (Figure 5—figure supplement 2 and Figure 7—figure supplement 2).

- It looks like the deoptimized codons to some extent act as stop codon, but not always. Also, when looking at pairs it seems it is easy to create a PAS, are the deoptimized codons introduced into the frq gene forming pairs that make up PAS?

Codon de-optimization of frq resulted in a potential PAS (AAUAAU in frq-deopt1 and AAUAAA in frq-deopt2) which is located 18-nt upstream of poly(A) sites mapped by 3’ RACE (Figure 2C).

- Does the premature stop in NCU09435 and NCU00931 (Figure 4A) happen at non-optimal codons? Similarly, I'm missing a global analysis that counts the number of natural premature stop events per codon (rather than a correlation of indexes with ratios).

Transcription does not terminate at a single site but rather in a small region downstream of the PAS motif and forms a cluster of poly(A) sites. Therefore, it is difficult to calculate premature termination events at codon-level resolution.

As shown in Figure 4—figure supplement 1, premature termination sites are mapped to a small region of NCU09435 and NCU00931 ORF. And, both these two regions appear to be enriched in non-optimal codons.

- Negative correlation of CAI and CBI with ratio of ORF-PAS vs. 3'PAS: what about the codon, or even codon-pair at the exact location of the premature stop? Are these predominantly non-optimal codons? Or is the mechanism rather additive and PolII needs to encounter a couple of non-optimal codons to stop?

As mentioned above, transcription is not terminated at a single site but rather in a small region downstream of the PAS motif and forms a cluster. And it is difficult to determine at which codon or codon pair termination occurs. It is important to note that it is not codon or codon pair terminate transcription. Termination relies on the presence of the poly(A) signals, which include the PAS motif and its surrounding cis-elements. The use of clusters of rare codons will lead to the formation of potential poly(A) signals that terminate transcription.

- Figure 5E: would be easier to read if it was on the same axis. Also the red line for K is somehow shifted. How does this correlation look for all AA within the same plot? Is it still visible or is it important to split them by AA?

As suggested, this figure was revised in current manuscript. As suggested, scatter plots showing the correlation between normalized codon usage frequency (NCUF) and relative synonymous codon adaptiveness of all codons with at least two synonymous codons are shown. Negative correlation was observed for both Neurospora (Figure 5E, r = -0.55) and mouse (Figure 7—figure supplement 2, r = -0.36).

- Is it just the formation of PAS that makes these rarer codons affect premature stop? I would welcome an analysis that looks at rare codons that do form a canonical PAS vs. those that don't and compare their effect on premature transcription stop. If it is mainly the formation of PAS that drives the termination I think the authors should be more explicit about this and say the mechanism of how rare codons affect premature termination is through forming non-canonical PAS.

We found that premature transcription termination is not simply due to the formation of a PAS motif, other surrounding cis-elements are also very important for PCPA. Clusters of rare codons appear to form the A/U rich poly(A) signal, including PAS motif and other surrounding cis-elements, that can be recognized by transcription termination machinery.

We showed that PCPA didn’t occur in the wild-type NCU02034, even though a PAS motif was found in the wild-type gene (Figure 5—figure supplement 1B). After codons surrounding the PAS was deoptimized, transcription was fully terminated in the coding region.

In addition, we performed genome-wide analyses in Neurospora and mouse by identifying “true PAS” and “false PAS” motifs based on the presence of poly(A) sites downstream of the PAS motifs and determined the nucleotide composition surrounding the groups of PAS motifs (Figure 5—figure supplement 2 and Figure 7—figure supplement 2). Our results clearly showed that in the ORFs, single PAS motif is not sufficient to trigger PCPA. In addition, the AU contents surrounding true PAS motifs are higher than in false PAS motifs, indicating that surrounding cis-elements are also required for triggering PCPA. These results are consistent with our results that the regions surrounding true PAS motifs are enriched for rare codons. Therefore, we propose that rare codons can potentially promote PCPA by the formation of PAS motif and its surrounding cis-elements.

- If it is not only the formation of a PAS (based on pair-wise analysis), what are the authors proposing as a mechanism? Is there any ChIP-seq data or similar available in Neurospora that shows an enrichment over the rare codons? For sure there is lots of ChIP-seq data available in mouse where the authors could look for factors potentially enriched at rare codons. Or does this coincide with PolII pausing?

- The authors have excluded an effect on chromatin based on looking at one mark in one locus. Is there any genome-wide data available that the authors could use for checking this statement across the naturally occurring rare codons? For sure in mouse there would be enough data for that.

We have shown that H3K9me3 levels at frq promoter were comparable in the wt-frq and frq-deopt2 strains. This result indicates that the loss of full-length frq mRNA in the frq-deopt2 strains is not due to H3K9me3-mediated transcriptional silencing.

We have done polII, H3K4me3, and H3K36me3 ChIP-seq before and we found that polII, H3K4me3, and H3K36me3 are positively correlated with CBI and mRNA levels in Neurospora. We think this is mainly due to the effect of codon usage on transcription, which we are currently investigating to understand the underlying mechanism.

As for ChIP-seq analysis to look for factors potentially enriched at rare codons, we attempted but we felt the results could not be interpreted because the resolution of ChIP-seq results (typically ~200 bp) do not have a codon-level resolution.

[Editors' note: further revisions were requested prior to acceptance, as described below.]

The manuscript has been improved but there are some remaining issues that need to be addressed before acceptance, as outlined below:

You do not appear to have taken the S. cerevisiae literature on transcription termination into consideration when revising the manuscript. This will have to be done before final acceptance of the manuscript. If you do not deem this concern appropriate, then please provide an argument to this effect.

To address the concern that there is a lack of S. cerevisiae literatures, we now added more than 10 yeast references in the revised paper:

Mischo and Proudfoot, 2013; Ozsolak et al., 2010; Moqtaderi et al., 2013; Mata, 2013; Schlackow et al., 2013; Vavasseur and Shi, 2014; Liu et al., 2017; Guo and Sherman, 1996; Graber et al., 1999; Graber, McAllister and Smith, 1999; Lemay et al., 2016; West and Proudfoot, 2009; Shalem et al., 2015; Lemay and Bachard, 2015; Larochelle, Hunyadkurti and Bachard, 2017.

[Editors' note: further revisions were requested prior to acceptance, as described below.]

The manuscript has been improved but there are some remaining issues that need to be addressed before acceptance, as outlined below:

There appears to be a misunderstanding concerning the most recent editorial comment: "You do not appear to have taken the S. cerevisiae literature on transcription termination into consideration when revising the manuscript. This will have to be done before final acceptance of the manuscript. If you do not deem this concern appropriate, then please provide an argument to this effect.".

The referees of the original submission requested a referencing of the widely accepted link between codon bias and premature transcription termination by the Nrd1-Nab3-Sen1 dependent pathway in S. cerevisiae. Even though this is poly(A) site independent, conceptually this phenomenon is similar to the one described in the manuscript and hence deserves mentioning. The key reference is doi: 10.1093/nar/gkw683. Hence, either mention this literature or provide an argument why this is not relevant. Previously added references concerning S. cerevisiae polyadenylation do not appear relevant and may be deleted again.

To address the concern you pointed out, we now added the suggested reference (Cakiroglu, Zaugg and Luscombe, 2016) and description of the study in the revised paper.

https://doi.org/10.7554/eLife.33569.025

Article and author information

Author details

  1. Zhipeng Zhou

    Department of Physiology, The University of Texas Southwestern Medical Center, Dallas, United States
    Contribution
    Conceptualization, Resources, Formal analysis, Validation, Investigation, Methodology, Writing—original draft, Writing—review and editing, Designed the study, performed almost all the Neurospora-related experiments, performed statistical analysis, interpreted the results
    Contributed equally with
    Yunkun Dang
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-8449-7194
  2. Yunkun Dang

    1. State Key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan, Yunnan University, Kunming, China
    2. Center for Life Science, School of Life Sciences, Yunnan University, Kunming, China
    Contribution
    Conceptualization, Resources, Data curation, Software, Formal analysis, Investigation, Methodology, Writing—original draft, Writing—review and editing, performed bioinformatic and statistical analyses, interpreted the results
    Contributed equally with
    Zhipeng Zhou
    For correspondence
    ykdang@ynu.edu.cn
    Competing interests
    No competing interests declared
  3. Mian Zhou

    State Key Laboratory of Bioreactor Engineering, East China University of Science and Technology, Shanghai, China
    Contribution
    Formal analysis, Investigation, Methodology, generated the frq-deopt1 and frq-deopt2 strains, interpreted the results
    Competing interests
    No competing interests declared
  4. Haiyan Yuan

    Department of Physiology, The University of Texas Southwestern Medical Center, Dallas, United States
    Contribution
    Resources, Investigation, contributed reagents, interpreted the results
    Competing interests
    No competing interests declared
  5. Yi Liu

    Department of Physiology, The University of Texas Southwestern Medical Center, Dallas, United States
    Contribution
    Conceptualization, Formal analysis, Supervision, Funding acquisition, Writing—original draft, Writing—review and editing, Designed the study, interpreted the results
    For correspondence
    yi.liu@utsouthwestern.edu
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-8801-9317

Funding

National Institute of General Medical Sciences (R35GM118118)

  • Yi Liu

Cancer Prevention and Research Institute of Texas (RP160268)

  • Yi Liu

Welch Foundation (I-1560)

  • Yi Liu

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We thank Dr. Noah Spies for providing the 2P-seq protocol and the members of our laboratory for technical assistance and discussion. This work is supported by grants from the National Institutes of Health (R35GM118118), Cancer Prevention and Research Institute of Texas (RP160268), and the Welch Foundation (I-1560) to Yi Liu.

Reviewing Editor

  1. Torben Heick Jensen, Aarhus University, Denmark

Publication history

  1. Received: November 15, 2017
  2. Accepted: March 15, 2018
  3. Accepted Manuscript published: March 16, 2018 (version 1)
  4. Version of Record published: March 26, 2018 (version 2)

Copyright

© 2018, Zhou et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 3,143
    Page views
  • 489
    Downloads
  • 21
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)

Further reading

    1. Chromosomes and Gene Expression
    2. Genetics and Genomics
    Xiaolu Wei et al.
    Research Article Updated

    Large blocks of tandemly repeated DNAs—satellite DNAs (satDNAs)—play important roles in heterochromatin formation and chromosome segregation. We know little about how satDNAs are regulated; however, their misregulation is associated with genomic instability and human diseases. We use the Drosophila melanogaster germline as a model to study the regulation of satDNA transcription and chromatin. Here we show that complex satDNAs (>100-bp repeat units) are transcribed into long noncoding RNAs and processed into piRNAs (PIWI interacting RNAs). This satDNA piRNA production depends on the Rhino-Deadlock-Cutoff complex and the transcription factor Moonshiner—a previously described non-canonical pathway that licenses heterochromatin-dependent transcription of dual-strand piRNA clusters. We show that this pathway is important for establishing heterochromatin at satDNAs. Therefore, satDNAs are regulated by piRNAs originating from their own genomic loci. This novel mechanism of satDNA regulation provides insight into the role of piRNA pathways in heterochromatin formation and genome stability.

    1. Genetics and Genomics
    2. Microbiology and Infectious Disease
    Joshua C D'Aeth et al.
    Research Article Updated

    Multidrug-resistant Streptococcus pneumoniae emerge through the modification of core genome loci by interspecies homologous recombinations, and acquisition of gene cassettes. Both occurred in the otherwise contrasting histories of the antibiotic-resistant S. pneumoniae lineages PMEN3 and PMEN9. A single PMEN3 clade spread globally, evading vaccine-induced immunity through frequent serotype switching, whereas locally circulating PMEN9 clades independently gained resistance. Both lineages repeatedly integrated Tn916-type and Tn1207.1-type elements, conferring tetracycline and macrolide resistance, respectively, through homologous recombination importing sequences originating in other species. A species-wide dataset found over 100 instances of such interspecific acquisitions of resistance cassettes and flanking homologous arms. Phylodynamic analysis of the most commonly sampled Tn1207.1-type insertion in PMEN9, originating from a commensal and disrupting a competence gene, suggested its expansion across Germany was driven by a high ratio of macrolide-to-β-lactam consumption. Hence, selection from antibiotic consumption was sufficient for these atypically large recombinations to overcome species boundaries across the pneumococcal chromosome.