Avoidance of stochastic RNA interactions can be harnessed to control protein expression levels in bacteria and archaea

Abstract
eLife digest
Introduction
Results and discussion
Materials and methods
References
Article and author information
Metrics

Abstract

A critical assumption of gene expression analysis is that mRNA abundances broadly correlate with protein abundance, but these two are often imperfectly correlated. Some of the discrepancy can be accounted for by two important mRNA features: codon usage and mRNA secondary structure. We present a new global factor, called mRNA:ncRNA avoidance, and provide evidence that avoidance increases translational efficiency. We also demonstrate a strong selection for the avoidance of stochastic mRNA:ncRNA interactions across prokaryotes, and that these have a greater impact on protein abundance than mRNA structure or codon usage. By generating synonymously variant green fluorescent protein (GFP) mRNAs with different potential for mRNA:ncRNA interactions, we demonstrate that GFP levels correlate well with interaction avoidance. Therefore, taking stochastic mRNA:ncRNA interactions into account enables precise modulation of protein abundance.

https://doi.org/10.7554/eLife.13479.001

eLife digest

Many genes carry information for making proteins. To make a protein, a working copy of the information stored in DNA is first copied into a molecule of messenger RNA. These RNA messages are then interpreted by the ribosome, the molecular machine that makes proteins. Many messages are produced from each gene, and each message can be read multiple times. Thus, it should follow that the number of messages produced dictates the number of proteins made. However, this is not the case and the number of proteins produced cannot be completely predicted from knowing the number of messenger RNAs.

Cells control how much of a given protein they produce through interactions between the messenger RNAs and other regulatory RNAs. The regulatory RNAs bind directly to a message and impede protein production. Because there are millions of RNAs in a cell, these interactions have evolved to be highly specific. Nevertheless, it seems inevitable that messenger RNAs would encounter other RNAs too, which could short-circuit gene regulation and lead to less protein being produced.

Umu et al. have now asked if such short-circuit events are selected against during evolution. Computational tools were used to predict the strength of binding between the RNAs found in the dominant forms of microbial life on Earth: the bacteria and the archaea. This approach revealed that the majority of messenger RNAs bind more weakly to the most common RNA molecules found in cells than would be expected by chance. Weakened binding should prevent the RNA molecules from becoming tangled with each other and ensure that protein levels are not perturbed by unintended interactions between highly expressed messages and other RNAs.

To test this hypothesis further, Umu et al. generated versions of the gene for a green fluorescent protein that differed only in how well their messenger RNAs could avoid interacting with the most abundant RNAs in E. coli cells. Those messengers that were designed to avoid interacting with other RNAs yielded far more protein than those that were not. The findings show that taking this kind of avoidance into account can improve predictions about how much protein will be produced and should therefore make it easier to control protein production in experimental systems.

Finally, the messenger RNAs of some bacteria do not show such clear avoidance. However, these bacteria have a more complex internal cell structure. This finding hints at an alternative means for avoiding short-circuiting events that could be used by more complicated cells, such of those of animals and plants, which also contain much larger numbers of RNAs.

https://doi.org/10.7554/eLife.13479.002

Introduction

It should in principle be possible to predict protein abundance from genomic data. However, protein and mRNA levels are not strongly correlated (de Sousa Abreu et al., 2009; Vogel and Marcotte, 2012; Kwon et al., 2014; Maier et al., 2011; Lu et al., 2007; Taniguchi et al., 2010; Chen et al., 2016), which is a major barrier to precision bioengineering and quantification of protein levels. mRNA secondary structure (Pelletier and Sonenberg, 1987; Chamary and Hurst, 2005), codon usage (Ikemura, 1981; Sharp and Li, 1987; Andersson and Kurland, 1990), and mRNA (and protein) degradation rates (Maier et al., 2011) are commonly invoked to explain this discrepancy (Boël et al., 2016-21). Yet, at best, these features account for only 40% of variation, and in some instances explain very little of the observed variation (Kudla et al., 2009; Maier et al., 2011; Plotkin and Kudla, 2011; Goodman et al., 2013; Chen et al., 2016). Here we show that crosstalk interactions between ncRNAs and mRNAs also impact protein abundance, and that such interactions have a greater effect than either mRNA secondary structure or codon usage. We measured interactions between a set of evolutionarily conserved core mRNAs and ncRNAs from 1700 prokaryotic genomes using minimum free energy (MFE) models. For 97% of species, we find a reduced capacity for interaction between native RNAs relative to controls. Furthermore, by generating synonymously variant GFP mRNAs that differ in their potential to interact with core ncRNAs, we demonstrate that GFP expression levels can be both predicted and controlled. Our results demonstrate that there is strong selection for the avoidance of stochastic mRNA:ncRNA interactions across prokaryotes. Applying this knowledge to mRNA design will enable precise control of protein abundance through the incorporation or exclusion of inhibitory interactions with native ncRNAs.

Results and discussion

To examine if avoidance of stochastic mRNA:ncRNA interactions is a feature of transcriptomes in bacteria and archaea, we estimated the strength of all possible intermolecular RNA interactions using a minimum free energy (MFE) model (Mückstein et al., 2006) using core ncRNAs and mRNAs. In this work the core ncRNAs are six well conserved and highly expressed tRNA, rRNA, RNase P RNA, SRP RNA, tmRNA and 6S RNA families annotated by Rfam (Gardner et al., 2011; Nawrocki et al., 2015), the core mRNAs are 114 well conserved mRNAs found across bacteria, 40 of which are also conserved across archaea (Wu et al., 2013).

If stochastic interactions are selected against, because of the capacity for abundant ncRNAs (Lindgreen et al., 2014; Deutscher, 2006; Giannoukos et al., 2012) to impact translation (Waters and Storz, 2009; Storz et al., 2011), such negative selection would be most comparable between species and readily detected for broadly conserved ncRNAs and mRNAs. Under-representation of interactions has been considered for the specific case of Shine-Dalgarno-like (SD-like) sequences and the ribosome (Li et al., 2012; Woolstenhulme et al., 2015; Borg and Ehrenberg, 2015; Diwan and Agashe, 2016) and between microRNAs and 3' UTRs (Bartel and Chen, 2004; Farh, 2005; Stark et al., 2005; van Dongen et al., 2008). We computed the free energy distribution of interactions between highly conserved mRNA:ncRNA pairs and compared this to a number of negative control interactions, which serve to show the expected distribution of binding energy values (Figure 1A). The initiation of translation has been shown to be the rate limiting step for translation (Tuller and Zur, 2015; Plotkin and Kudla, 2011; Nakahigashi et al., 2014), therefore, we focus our analysis on the first 21 nucleotides of the mRNA coding sequence (CDS). This has the further advantage of reducing computational complexity. We also test a variety of negative control mRNA regions, which are unlikely to play a functional role in RNA:ncRNA interactions. The mRNA controls include (1) di-nucleotide preserving shuffled sequences (Workman, 1999) (orange, Figure 1A), (2) homologous mRNAs from another phylum (with a compatible guanine-cytosine (G+C) content) (purple), (3) downstream regions 100 base pairs (bps) within the CDS (pink), (4) the reverse complement of the 5´ of CDSs (green), and lastly (5) unannotated (intergenic) genomic regions (yellow). Our interaction predictions in a single model strain show that native interactions consistently have higher (i.e. less stable) free energies than expected when compared to the five different mRNA negative controls: that is, there is a reduced capacity for native mRNAs and native ncRNAs to interact. We also compared different energy models and confirm that the MFE shift is a result of intermolecular binding (Figure 1—figure supplement 1A–C). We subsequently deployed the most conservative negative control (i.e. di-nucleotide preserving shuffle) and free energy model (Figure 1—figure supplement 1C) to detect if this shift for less stable binding of mRNA:ncRNA is true of all bacteria and archaea.

Figure 1 with 2 supplements see all

Download asset Open asset

mRNA:ncRNA avoidance is a conserved feature of bacteria and archaea.

(A) Native core mRNA:ncRNA binding energies (green line; mean = −3.21 kcal/mol) are significantly higher than all mRNA negative control binding energies (dashed lines; mean binding energies are -3.62, -5.21, -4.13, -3.86 & -3.92 kcal/mol respectively) in pairwise comparisons (p<2.2 × 10⁻¹⁶ for all pairs, one-tailed Mann-Whitney U test) for *Streptococcus suis* RNAs. (B) The difference between the density distributions of native mRNA:ncRNA binding energies and dinucleotide preserved shuffled mRNA:ncRNA controls as a function of binding energy for different taxonomic phyla. Each coloured curve illustrates the degree of extrinsic avoidance for different bacterial phyla or the archaea. Positive differences indicate an excess in native binding for that energy value, negative differences indicate an excess of interactions in the shuffled controls. The dashed black line shows the expected result if no difference exists between these distributions and the dashed grey lines show empirical differences for shuffled vs shuffled densities from 100 randomly selected bacterial strains. (C) This box and whisker plot shows −log₁₀(P) distributions for each phylum and the archaea, the p-values are derived from a one-tailed Mann-Whitney U test for each genome of native mRNA:ncRNA versus shuffled mRNA:ncRNA binding energies. The black dashed line indicates the significance threshold (p<0.05). (D) A high intrinsic avoidance strain (*Thermodesulfobacterium sp.* OPB45) shows a clear separation between the G+C distribution of mRNAs and ncRNAs (p=9.2 × 10⁻²⁵, two-tailed Mann-Whitney U test), and a low intrinsic avoidance strain (*Mycobacterium sp.* JDM601) has no G+C difference between mRNAs and ncRNAs (p=0.54, two-tailed Mann-Whitney U test). (E) The x-axis shows −log₁₀(P) for our test of extrinsic avoidance using binding energy estimates for both native and shuffled controls, while the y-axis shows −log₁₀(P) for our intrinsic test of avoidance based upon the difference in G+C contents of ncRNAs and mRNAs. Two perpendicular dashed black lines show the threshold of significance for both avoidance metrics. 97% of bacteria and archaea are significant for at least one of these tests of avoidance.

https://doi.org/10.7554/eLife.13479.003

In terms of stoichiometry, the model we use assumes that ncRNA expression levels are vastly in excess of mRNA expression levels (i.e. [ncRNA] >> [mRNA]) (Giannoukos et al., 2012; Deutscher, 2006). This is generally a biologically reasonable assumption when focussing on core genes based upon past analysis and our own work with RNA-seq data from a range of bacteria and archaea (Figure 4) (Lindgreen et al., 2014). Consequently, any potential mRNA interaction regions are saturated with ncRNA, therefore a summative model of interaction energies is a reasonable approximation to the estimated impact of excess hybridization. If modelling ncRNAs that are not so abundant, then a model weighted by expression level may be advantageous, but it is difficult to assess these across all conditions and developmental stages that are evolutionarily relevant. In order to ensure that our analysis is comparable across all bacteria and archaea we have focussed on just the most highly conserved ncRNA and protein-coding genes. Although, many of the ncRNAs are highly structured and are bound by RNA-binding proteins this is not the case during either synthesis and degradation of these products, furthermore, a fraction of the RNA components of these genes will be exposed. Therefore, we expect these will form useful datasets for initial testing of our hypothesis.

In order to assess whether mRNA:ncRNA avoidance is an evolutionarily conserved phenomenon, we calculated intermolecular binding energies for conserved ncRNAs and mRNAs from 1,582 bacterial and 118 archaeal genomes and compared these to a negative control dataset derived using a di-nucleotide frequency preserving shuffling procedure (Workman, 1999). This measures a property that we call the ‘extrinsic avoidance‘ of mRNA:ncRNA interactions, yet this approach may fail to identify genuine avoidance in cases when the G+C content differences between interacting RNAs is extreme. Measuring only extrinsic avoidance (using shuffled mRNAs as negative controls), we found that stochastic mRNA:ncRNA interactions are significantly underrepresented in most (73%) of the prokaryotic phyla (p<0.05, one-tailed Mann-Whitney U test) (Figure 1B,C and Figure 1—figure supplement 2). This indicates that there is selection against stochastic interactions in both bacteria and archaea.

We next sought to establish the degree to which intrinsic G+C features of RNAs lead to avoidance of stochastic interactions (Figure 1D). A similar idea has been proposed which suggests that purine loading in thermophilic bacteria may limit mRNA:mRNA interactions (Lao and Forsdyke, 2000). A test of G+C composition revealed a significant difference (p<0.05, two-tailed Mann-Whitney U test) between mRNAs and ncRNAs for 95% of bacteria and archaea (Figure 1D,E). Therefore, either extrinsic or intrinsic avoidance signals indicate that selection against stochastic interactions and it is near-universal for the prokaryotes (97% of all strains) (Figure 1E and Supplementary file 1A and B).

Our results clearly establish a signature of selection that acts to minimise stochastic mRNA:ncRNA interactions. However, with thousands of potential interacting RNA species in even simple prokaryotic systems (Vivancos et al., 2010; Sharma et al., 2010), the complete avoidance of stochastic interactions is combinatorially unlikely. Therefore, there ought to be a tradeoff between avoidance and optimal expression. To assess this, we examined the relationship between potential stochastic interactions and the variation between mRNA and cognate protein levels for four previously published endogenous mass spectrometry datasets from Escherichia coli (E. coli) and Pseudomonas aeruginosa (P. aeruginosa) (Laurent et al., 2010; Kwon et al., 2014; Lu et al., 2007). We computed Spearman’s correlation coefficients between protein abundances and extrinsic avoidance, 5´ end internal mRNA secondary structure and codon usage. Of the three measures, avoidance is significantly correlated in all four datasets (Spearman’s rho values are between 0.11–0.17 and corresponding p-values are between 0.01 and 1.3 × 10^–12). In contrast, 5´ end mRNA structure significantly correlates in two datasets, and codon usage significantly correlates in all four datasets. This indicates that, despite strong selection against stochastic interactions, such interactions do significantly impact the proteome (Figure 2A and Supplementary file 3). We have also conducted an ‘outlier analysis’ on one of the E. coli datasets (Laurent et al., 2010). We have selected the top and bottom-most expressed genes relative to mRNA expression levels and computed Z-scores for each of codon-usage, internal secondary structure and avoidance measures. We found that avoidance measures show the most extreme shifts downwards for the bottom-most expressed genes and is shifted the highest for the top-most genes (Figure 2—figure supplement 4).

Figure 2 with 5 supplements see all

Download asset Open asset

mRNA attributes have different impacts on protein abundance.

(A) This heatmap summarizes the effect sizes of four mRNA attributes (avoidance of mRNA:ncRNA interaction, 5´ end secondary structure, codon bias and mRNA abundance) on protein expression as Spearman’s correlation coefficients, which are represented in gradient colors, while a starred block shows if the associated correlation is significant (p<0.05). (B) GFP expression correlates with optimized codon selection, measured by CAI (R_s = 0.29, p=0.016). (C) GFP expression correlates with 5 $^{'}$ end secondary structure of mRNAs, measured by 5’ end intramolecular folding energy (R_s = 0.34, p=0.006). (D) GFP expression correlates with avoidance, measured by mRNA:ncRNA binding energy (R_s = 0.56, p=6.9 × 10⁻⁶). (E) Each cartoon illustrates the corresponding hypothesis; (1) optimal codon distribution (corresponding tRNAs are available for translation), (2) low 5´ end RNA structure (high folding energy of 5´ end) and (3) avoidance (fewer crosstalk interactions) lead to faster translation.

https://doi.org/10.7554/eLife.13479.006

We also test how mRNA:ncRNA crosstalk impacts the translation of transformed mRNAs that have not coevolved with the ncRNA repertoire (low avoidance mRNAs are rare in native datasets). We examined two available E. coli-based GFP experimental datasets (Goodman et al., 2013; Kudla et al., 2009), where synonymous mRNAs are generated for a GFP reporter gene. This enables the assessment of the impact of synonymous changes on protein abundance using fluorescence. Avoidance and mRNA secondary structure are both significantly correlated with fluorescence, whereas codon usage is not (Spearman’s rho values are 0.11 and 0.65, the corresponding p-values are 3.17 × 10⁻⁴¹ and 1.69^–20) (Figure 2A). Note that one of the GFP datasets (Goodman et al., 2013) uses native E. coli mRNA 5´ ends for their constructs, whereas the other GFP dataset (Kudla et al., 2009) is randomly generated. We observe that the influence of avoidance on gene expression for randomly sampled synonymous mRNAs is strong (Figure 2—figure supplement 3), while endogenous gene expression is limited. Presumably, due to negative selection pruning low avoidance mRNAs from the gene pool (Figure 2A).

For each of the seven datasets described above we have tested linear models of measures of mRNA levels, codon usage, internal secondary structure and avoidance (Figure 2—figure supplement 3 and Supplementary file 5). Avoidance alone explains around 35% of variance in GFP datasets where extreme mRNA compositions can be explored, whereas in native mass-spec derived datasets 2–3% of the variance is explained by avoidance alone. Codon usage describes 2% to −0.5% of variance in GFP data, and 19% to 0.3% of variance in mass-spec derived datasets. Internal secondary structure 33% to 10% in GFP datasets, and 0.2% to 0% of the variance in mass-spec derived datasets. Using all four measures in combination across the seven datasets between 70% and 42% of variation in protein levels can be explained, removing avoidance from the model reduces these estimates by between 56% and 0.7%. Thus, avoidance is at least as good an explanation of variation in protein abundance as either codon usage and internal mRNA secondary structure.

Our results indicate that crosstalk between mRNAs and ncRNAs can impact protein expression levels. We therefore predict that taking crosstalk into account will enable the design of constructs where protein expression levels can be precisely controlled. To test this, we generated GFP constructs based on the following constraints: codon bias, 5´ end mRNA secondary structure stability and crosstalk avoidance (see Materials and methods). Our constructs are designed to capture the extremes of one variable, while controlling other variables (e.g. high or low avoidance and near-average codon bias and mRNA secondary structure). The G+C content, a known confounding factor, was also strictly controlled for each construct. We selected a commercial service to perform our GFP transformations to avoid possible bias and increase the robustness of our approach (Ioannidis and Khoury, 2011). We predicted that a construct where all three parameters are optimised will result in a higher expression. Consistent with predictions, our optimised construct had maximal expression (Figure 2—figure supplement 1). Of the three parameters, avoidance showed the largest range, suggesting that tuning this parameter permits expression levels to be finely controlled (R_s = 0.56, p=6.9 × 10⁻⁶) (Figure 2B–D and Figure 2—figure supplements 1–4).

For a final confirmation of the avoidance hypothesis, we tested the Thermus thermophilus (T. thermophilus) HB8 SSU ribosomal RNA, which is a component of one of the most complete prokaryotic ribosomal structures available in the PDB (Rozov et al., 2015). We identified the regions of the SSU rRNA that had the least capacity to interact with T. thermophilus core mRNAs and found that these regions were generally not bound to either ribosomal proteins or other ncRNAs, such as the LSU rRNA (p=2.49 × 10⁻¹⁷, Fisher’s exact test) (Figure 3; see ‘Materials and methods’). The influence of internal SD-like regions on translation pausing have been described elsewhere (Li et al., 2012), in addition we note that the anti-SD region on SSU rRNA is one of the RNA avoidance regions (Figure 3A).

Figure 3 with 3 supplements see all

Download asset Open asset

The most under-represented mRNA:rRNA interactions correspond to exterior regions of the ribosome.

(A) In the upper bar, the regions of the *T. thermophilus* SSU rRNA that are under-represented in stable interactions with mRNAs (p<0.05) are highlighted in red. In the lower bar, the inaccessible residues (<3.4 Angstroms from other nucleotides or amino acids in the PDB structure 4WZO). (B) The 3 dimensional structure of the *T. thermophilus* ribosome includes 5S, SSU and LSU rRNA, 48 ribosomal proteins, 4 tRNA and a bound mRNA (PDB ID: 4WZO) (Rozov et al., 2015). We have highlighted the most avoided regions of the SSU rRNA in red (based upon the fewest stable interactions with *T. thermophilus* mRNAs (p<0.05). Two different orientations are shown on the left and right, the upper structure shows just the SSU rRNA and mRNA structures, the lower includes the ribosomal proteins (coloured blue). Bottom right, a view of the ribosome that also includes the LSU rRNA (green) is also shown. There is a significant correspondence between the accessibility of a region of SSU rRNA and the degree to which it is avoided (p=2.5 × 10⁻¹⁷, Fisher’s exact test).

https://doi.org/10.7554/eLife.13479.012

This study focusses on the 5´ ends of the CDS as this region is important for the initiation of translation (Plotkin and Kudla, 2011; Tuller and Zur, 2015) and is a consistent feature of all the genomic, transcriptomic, proteomic and GFP expression datasets that we have evaluated in this work. In smaller-scale tests we have observed similar conserved avoidance signals within the entire CDSs (Figure 3—figure supplement 1) and within the 5´UTRs (Figure 3—figure supplement 2). Furthermore, we predict that similar signals can be observed for mRNA:mRNA and ncRNA:ncRNA avoidance. Although the impacts of these features are challenging to validate, interactions between clustered regularly interspaced short palindromic repeats (CRISPR) spacer sequences (Bhaya et al., 2011) and core ncRNAs are good candidates to test ncRNA:ncRNA avoidance.

In conclusion, our results indicate that the specificity of prokaryotic ncRNAs for target mRNAs is the result of selection both for a functional interaction and against stochastic interactions. Our experimental results support the view that stochastic interactions are selected against, due to deleterious outcomes on expression. We suspect avoidance of crosstalk interactions has several evolutionary consequences. First, as transcriptional outputs become more diverse in evolution, we expect that the probability of stochastic interactions for both new ncRNAs and mRNAs becomes higher. This will impact the emergence of new, high abundance RNAs, since selection for high abundance may be mitigated by deleterious crosstalk events. Second, we predict that stochastic interactions limit the number of simultaneously transcribed RNAs, since the combinatorics of RNA:RNA interactions imply that eventually stochastic interactions cannot be avoided. This may in turn drive selection for forms of spatial or temporal segregation of transcripts. Finally, taking codon usage, mRNA secondary structure and potential mRNA:ncRNA interactions into account allows better prediction of proteome outputs from genomic data, and informs the precise control of protein levels via manipulation of synonymous mRNA sequences (Figure 2—figure supplement 5).

Materials and methods

Here we summarize the data sources, materials and methods corresponding to our manuscript. We performed all statistical analyses in R, and all other computational methods in Python 2.7 or Bash shell scripts. We explicitly cite all the bioinformatics tools and their versions. All tables (Supplementary files 1–5) are available as supporting online material. All of our own sequences, scripts and R workspace images are available on Github including the supplementary files (http://github.com/UCanCompBio/Avoidance). The other datasets are cited in the manuscript (Supplementary file 3).

Evolutionary conservation

Request a detailed protocol

If excessive interactions between messenger RNAs (mRNAs) and non-coding RNAs (ncRNAs) are detrimental to cellular function, then we expect the signature of selection against interactions (avoidance) to be a conserved feature of prokaryotic genomes. In the following, we describe where the data used to test the evolutionary conservation of avoidance was acquisitioned, the models that we use to test avoidance and the negative controls in detail for evolutionary conservation predictions. We also investigate detect regions of avoidance on one of the core ncRNAs, the ribosomal small subunit (SSU) RNA.

Data sources for bacterial genomes

Request a detailed protocol

The bacterial genomes and annotations that we used for investigating mRNA:ncRNA interactions were acquired from the EBI nucleotide archive (2564 sequenced bacterial genomes available on August 2013; http://www.ebi.ac.uk/genomes/bacteria.html). We selected an evolutionarily conserved (core) group of 114 mRNAs from PhyEco (Wu et al., 2013) and an evolutionarily conserved (core) group of ncRNAs (Hoeppner et al., 2012). PhyEco markers are based on a set of profile HMMs that correspond to highly conserved bacterial protein coding genes (these include ribosomal proteins, tRNA synthetases as well as other components of translation machinery, DNA repair and polymerases) (Wu et al., 2013). The HMMer package (version 3.1b1) (Eddy, 2011) was used to extract the mRNAs corresponding to these marker genes from genome files. We removed genome sequences that host fewer than 90% of the marker genes; leaving 1582 bacterial genome sequences and 176,704 core mRNAs that spanned these. We extracted the first to the 21st nucleotide of the core mRNAs. As this region showed the strongest signal in a small-scale analysis (Figure 3—figure supplement 1A), this region has also been shown to have an unusual codon distribution in previous work (Tuller and Zur, 2015; Goodman et al., 2013) as explained in the main text. We obtained ncRNA annotations using the Rfam database (version 11.0) (Gardner et al., 2011) for the well conserved and highly expressed tRNA, rRNA, RNase P RNA, SRP RNA, tmRNA and 6S RNA families (Rfam accessions: RF00001, RF00005, RF00010, RF00011, RF00013, RF00023, RF00169, RF01854, RF00177). The redundant annotations were filtered for overlapping and identical paralogous sequences, leaving 99,281 core ncRNA that spanned 1582 bacterial genomes.

Data sources for archaeal genomes

Request a detailed protocol

We followed a similar pipeline for archaeal genomes as described for bacterial genomes. In total we processed 240 archaeal genomes, and after filtering those that had fewer than 90% of the marker genes, we had 118 archaeal genomes for further analysis (genomes available on August 2013) (http://www.ebi.ac.uk/genomes/archaea.html). These genomes host 12,370 and 10,804 core mRNAs and core ncRNAs respectively.

Test of an (extrinsic) avoidance model

Request a detailed protocol

We used RNAup (version 2.0.7) (Lorenz et al., 2011) to calculate the binding minimum (Gibbs) free energy (MFE) values of mRNA:ncRNA interactions. The RNAup algorithm combines the intramolecular energy necessary to open binding sites with intermolecular energy gained from hybridization (Mückstein et al., 2006). In other words, this approach minimizes the sum of opening intramolecular energies and the intermolecular energy (Figure 1—figure supplement 1C). In our model of avoidance, we test for a reduction in absolute binding MFE relative to negative controls as a measure of avoidance. After testing a variety of negative controls (e.g. dinucleotide preserved shuffled mRNAs, the 5´ end of homologous mRNAs from a different bacterial phylum, 100 nucleotides downstream of designated interaction region, reverse complements, and identically sized intergenic regions), we selected the dinucleotide frequency preserved shuffled sequences as our negative control since this displayed the most conservative interaction MFE distribution (Figure 3—figure supplement 1A–C). In more detail, to serve as a negative control we compute the interaction MFE between each of the core ncRNAs and 200 dinucleotide-preserved shuffled versions of the 5´ end mRNAs. A dinucleotide frequency preserving shuffling procedure is used, as Gibbs free energies are computed over base pair stacks, i.e. a dinucleotide alphabet, therefore this method has been shown to be important in order to minimise incorrect conclusions (Workman, 1999). We tested if the energy difference between native and shuffled interaction distributions is statistically significant using the nonparametric one-tailed Mann-Whitney U test, which returns a single p-value per genome (Figure 1C). If the distribution of native interaction energies for a genome is significantly higher (i.e. fewer stable interactions) than the negative control, this is an indication that the genome has undergone selection for mRNA:ncRNA avoidance. To create the background density difference lines (seen in grey at Figure 1B), we randomly selected 100 bacterial strains and plot differences between the densities of shuffled interactions.

Test of an intrinsic avoidance model

Request a detailed protocol

The energy-based avoidance model that we defined above is opaque to cases of ‘intrinsic avoidance’. These are where the intrinsic properties of mRNA and ncRNA sequences restrict their ability to interact. For an extreme example, if ncRNAs are composed entirely of guanine and cytosine nucleotides, whilst mRNAs are composed entirely of adenine and uracil nucleotides, then these will rarely interact. Therefore, our energy-based avoidance measures for native and shuffled interactions will both be near zero, and thus will not detect a significant energy shift between the native and control sequences. In order to account for some of these issues, we compared the G+C difference between core ncRNAs and core mRNAs. We used a nonparametric two-tailed Mann-Whitney U test to determine if there is a statistically significant G+C difference between the two samples: G+C of ncRNAs vs G+C of 5´ end mRNAs (Figure 1D,E).

Sliding window analysis to detect regions of significance for avoidance on SSU ribosomal RNA

Request a detailed protocol

We hypothesise that heterogeneous signals of avoidance within ncRNA sequences may correspond to the accessibility of different ncRNA regions. For example, are highly avoided regions of abundant ncRNAs more accessible than those that are avoided less? To create an avoidance profile, we tested binding MFEs of native and shuffled interactions throughout the full-length SSU ribosomal RNA of T. thermophilus, using a one tailed Mann-Whitney U tests to evaluate the degree of avoidance for each nucleotide in the SSU rRNA (Figure 3) with a window size of 10 and step size of 1 (Supplementary file 4). We selected the protein data bank (PDB) entry (4WZO) as it is one of the few ribosomal structures with associated protein, mRNA, tRNA and LSU binding data (Rozov et al., 2015). The native interactions are the interactions between T. thermophilus core mRNAs and SSU ribosomal RNA. The shuffled controls are derived from 200 dinucleotide preserved shuffled versions of the RNAs. We created a 2 × 2 contingency table which separates the counts of residues that either host a strong avoidance signal or little avoidance signal (regions with p<0.001, Mann-Whitney U test) and residues that we predict to either be in contact (<3.4 Angstroms between atoms) with ribosomal proteins or ribosomal, transfer or messenger RNAs or not in contact with other molecules (i.e. accessible) (Figure 3). We applied a Fisher’s exact test Fisher (1992) to these groups to and discovered a statistically significant relationship between avoidance and accessibility (p=2.5 × 10⁻¹⁷).

We have applied the same analysis to the other T. thermophilus core ncRNA genes (tRNAs, tmRNA, RNase P RNA and SRP RNA) in order to determine regions of avoidance (Figure 3—figure supplement 3). Since there are more than one tRNAs, we aligned the cellular RNAs to the associated Rfam model (RF00005) (Gardner et al., 2011; Nawrocki et al., 2015) using the cmalign tool (Nawrocki and Eddy, 2013).

Sliding window analysis to detect regions of significance for avoidance on mRNAs

Request a detailed protocol

In order to identify a region of mRNA that is consistent and unique in the datasets that we applied evolutionary and expression analyses to we created an avoidance profile from the previously published GFP mRNAs (Kudla et al., 2009). We calculated binding MFEs using a window size of 21 with a 1 nucleotide step size, and for each region we computed the associated Spearman’s correlation coefficients with p-values. This analysis revealed the significance of the first 21 nucleotides on expression, this is consistent with previous results that identify initiation as the rate limiting step for translation (Tuller and Zur, 2015; Plotkin and Kudla, 2011). It also revealed other statistically significant regions with high correlation correlation coefficient throughout the GFP mRNAs (Figure 3—figure supplement 1A).

Proteomics/Transcriptomics and GFP expression

Request a detailed protocol

We predict that mRNAs with low avoidance values will produce fewer proteins for each mRNA transcript than those with high avoidance. In order to test this, we conducted a meta-analysis of proteomics and transcriptomics data and the relationship between this data and measures of mRNA and ncRNA avoidance. In the following section we describe the origins of the data we have used and the statistical analysis we use to test whether avoidance influences gene expression.

Data sources and statistics for mRNA, protein abundance and GFP expression

Request a detailed protocol

We compiled our data from five protein and mRNA quantification datasets, which consist of three E. coli (Laurent et al., 2010; Goodman et al., 2013; Lu et al., 2007) and two P. aeruginosa (Laurent et al., 2010; Kwon et al., 2014) (Supplementary file 3). We calculated Spearman’s correlation coefficients (and associated p-values) among the protein abundances and 5´ end secondary structure (measured by intermolecular MFE), codon bias (measured by codon adaptation index (CAI)) and avoidance (Figure 2A). We have created single and multiple regression models to determine the explained variances by these parameters (Figure 2—figure supplement 3 and Supplementary file 5). These models show that avoidance explains more variance on average than secondary structure or codon bias. Up to 70 percent of the variation in GFP expression can be explained by including all the parameters and mRNA abundances (Figure 2—figure supplement 3). CAI metric defines how well mRNAs are optimised for codon bias (Sharp and Li, 1987). The CAI values were determined based on codon distribution patterns acquired from the core protein coding genes of E. coli BL21(DE3) (Accession: AM946981.2) (Wu et al., 2013) using Biopython libraries (version 1.6) (Cock et al., 2009). The folding MFE predicts how stable the secondary structure of an RNA can be. The folding MFEs of GFP mRNAs were calculated using the RNAfold algorithm (version 2.0.7) (Lorenz et al., 2011). We restricted folding energy to first 37 nucleotides because the most significant correlation was previously reported for this region (Kudla et al., 2009). We acquired previously published GFP data, associated fluorescence values and mRNA quantifications (Kudla et al., 2009) via personal communication. Our avoidance model showed the highest and most significant correlation with GFP expression in that dataset (R_s = 0.65, p=1.69 × 10⁻²⁰) (Figure 2A and Figure 2—figure supplement 2D,E,F). 5´ end secondary structure (R_s = 0.62, p=5.73 × 10⁻¹⁸) correlates slightly less than avoidance, while CAI does not correlate significantly (R_s = 0.02, p=0.4).

mRNA design

Request a detailed protocol

We have shown that avoidance is a broadly evolutionary conserved phenomenon and that it is significantly correlated with protein abundance relative to mRNA abundance. We now wish to test if avoidance can be used to design mRNA sequences that modulate the abundance of corresponding protein in a predictable fashion. We use a set of GFP mRNA constructs that all maintain the same G+C content, codon adaptation index (CAI) and internal secondary structure but host either very high or very low avoidance values. This procedure was repeated for the CAI and internal secondary structure values while maintaining a constant avoidance. The resulting 13 constructs were synthesised, transformed and expressed by commercial services. In the following paragraphs, we explained how we design our GFP constructs, the experimental set-up and statistical analyses.

Green fluorescence protein (GFP) mRNA design

Request a detailed protocol

We sampled 537,000 synonymous mRNA variants of a GFP mRNA (the 239 AA, 720 nucleotide long, with accession AHK23750, can be encoded by 7.62 × 10¹¹¹ possible unique mRNA variants). In brief, these mRNA variants were scored based upon (1) CAI, (2) mRNA secondary structure in their 5´ end region, and (3) mRNA:ncRNA interaction avoidance in their 5 end region. The genome of E. coli BL21 encodes 52 unique core ncRNAs (Gardner et al., 2011; Nawrocki et al., 2015), to estimate the level of ncRNA avoidance for each GFP mRNA, we sum the binding MFEs. For example, for each GFP mRNA we compute 52 independent binding MFE values for each ncRNA. In short, a higher summed MFE score for a GFP mRNA implies a higher avoidance, while a lower summed MFE score implies a lower avoidance. This approach assumes that the ncRNAs are expressed at much higher levels than GFP mRNAs (i.e. [ncRNA] >> [mRNA]) (Figure 4). Consequently, any potential interaction site on GFP mRNAs are likely to be saturated with ncRNA. Finally, we selected 13 GFP mRNA constructs, while controlling the range of G+C values. These GFP mRNAs were designed to have four different aspects; extreme 5 end secondary structure (2 minimum and 2 maximum folding MFE constructs), extreme codon bias (2 maximum and 2 minimum CAI constructs), extreme interaction avoidance (2 minimum and 2 maximum binding MFE constructs) and an ‘optimal’ construct. The optimal construct was selected for a high CAI, low 5´ end structure and high avoidance. All extreme GFP mRNA constructs have near identical G+C content (between 0.468–0.480) and identical G+C contents at the 5´ end (0.48). Each of the sampled GFP mRNAs is separated from other mRNAs by at least 112 nucleotide substitutions and 122 nucleotide substitutions on average (Figure 2—figure supplement 1).

Figure 4

Download asset Open asset

The median expression of core ncRNA genes (n = 325 data points) in prokaryotic genomes is nearly two orders of magnitude greater than core mRNAs (n = 8086 data points) which proves that ncRNAs constitute most of the cellular RNAs.

To create this plot, we used mean mapped reads per gene length (i.e. mean read depth per position) of each core gene. The expression data are compiled from 5 archaeal and 37 bacterial strains from a previous study (Lindgreen et al., 2014).

https://doi.org/10.7554/eLife.13479.016

Extreme GFP transformations, determining fluorescence levels and RT-qPCR analyses

Request a detailed protocol

Both GFP expression assays and RT-qPCR analyses were performed as part of a commercial service offered by the University of Queensland, Protein Expression Facility and Real-Time PCR Facility. Plasmid DNA from each construct was transformed into an expression strain of E. coli BL21(DE3). Starter cultures were grown in quadruplicate from single colonies in 0.5 mL of TB kanamycin 30 μg/mL media in a 96 deep-well microplate and incubated at 30°C, 400 rpm (3 mm shaking throw). Each starter culture was used to inoculate 1.0 mL of the same media at a ratio of 1:50, each in a single well of a 96 deep-well plate. The cultures were incubated at 30°C, 400 rpm for 1 hr, at this point the cultures were chilled for 5 min then induced into 0.2 mM IPTG and incubated at 20°C. For analysis, culture samples of 100 μL were taken at 1 hr, 2 hr, 3 hr, 4 hr and 22 hr (overnight) hours post-induction (HPI) for fluorescence and optical density analysis. Samples were collected in PetriWell 96-well flat bottom, black upper, lidded microplates (Genetix). Cell density of fluorescence measurements was performed on a Spectramax M5 Microplate Reader using SMP software v 5.2 (Molecular Devices). For fluorescence intensity measurements, samples were collected in the 96-well plate listed above. Samples were analysed by bottom-read, 10 reads per well at an excitation wavelength = 488 nm, emission wavelength = 509 nm with an automatic cut-off at 495 nm and measured as relative fluorescence units (RFU). The raw RFU values were normalised by subtracting the averaged baseline values obtained from untransformed BL21(DE3) at the same time point. All samples at the 22 HPI time point were diluted 1:4 in TB kanamycin 30 μg/mL media before measurement. Total RNA was purified from induced 0.5 mL of BL21(DE3) cultures on MaxwellÂ 16 robot (Promega) using LEV simplyRNA Tissue Kit (Promega). RNA concentrations were assessed on Qubit 3.0 Fluorometer (Thermo Fisher Scientific). cDNA synthesis was done using ProtoScript II First Strand cDNA Synthesis Kit (NEB) according to manufacturer protocol using random primer. The rpsL gene was selected as the reference gene (internal control). RT-qPCR was performed in 384-well plates with a ViiA 7 Real-Time PCR System (Thermo Fisher Scientific) using Life Technology SYBR Green-based PCR assay. The data analysis was performed using Applied Biosystems QuantStudio software (Thermo Fisher Scientific). The total volume of reaction was 10 μL including 0.2 μM of each primer as a final concentration. The following PCR conditions were used: 95°C for 10 min, followed by 40 cycles of 95°C for 15 s and 60°C for 1 min. The melting curves were analyzed at 60–95°C after 40 cycles. RNA concentrations were subsequently estimated using the approach (Schmittgen and Livak, 2008). We shared the raw data, oligos and primers in the supplementary files (Supplementary file 2A,B).

Statistical analyses of extreme GFP data

Request a detailed protocol

As described, we designed extreme GFP mRNA constructs, and measured the associated fluorescence. A Kruskal-Wallis test (nonparametric alternative of ANOVA) shows a statistically significant difference between the fluorescence of GFP mRNA groups (p=1.35 × 10^–5) (Figure 2—figure supplement 1). Our pairwise comparison of GFP groups using a Kruskal-Nemenyi test (a nonparametric alternative of the Student’s t-test) for fluorescence difference also reveals a statistically significant difference in fluorescence between high avoidance constructs and low avoidance constructs (p=0.00036). We computed the Spearman’s correlation coefficients (and associated p-values) between GFP expression and each of the following measures; CAI (R_s = 0.29, p=0.016), intramolecular folding energy (R_s = 0.34, p=0.006), avoidance (intermolecular binding energy) (R_s = 0.29, p=6.9 × 10⁻⁶) and mRNA concentration (R_s = 0.73, p=3.2 × 10⁻³) to predict effect size of each predictor. Our avoidance model resulted in the highest correlation with GFP expression (Figure 2B–D).

References

1. Andersson SG
2. Kurland CG
(1990)
Codon preferences in free-living microorganisms

Microbiological Reviews 54:198–210.
- Google Scholar
1. Bartel DP
2. Chen CZ
(2004) Micromanagers of gene expression: the potentially widespread influence of metazoan microRNAs
Nature Reviews Genetics 5:396–400.

https://doi.org/10.1038/nrg1328
- Google Scholar
(2011) CRISPR-Cas systems in bacteria and archaea: versatile small RNAs for adaptive defense and regulation
Annual Review of Genetics 45:273–297.

https://doi.org/10.1146/annurev-genet-110410-132430
- Google Scholar
1. Borg A
2. Ehrenberg M
(2015) Determinants of the rate of mRNA translocation in bacterial protein synthesis
Journal of Molecular Biology 427:1835–1847.

https://doi.org/10.1016/j.jmb.2014.10.027
- Google Scholar
1. Boël G
2. Letso R
3. Neely H
4. Price WN
5. Wong K-H
6. Su M
7. Luff JD
8. Valecha M
9. Everett JK
10. Acton TB
11. Xiao R
12. Montelione GT
13. Aalberts DP
14. Hunt JF
(2016) Codon influence on protein expression in E. coli correlates with mRNA levels
Nature 529:358–363.

https://doi.org/10.1038/nature16509
- Google Scholar
1. Chamary JV
2. Hurst LD
(2005) Evidence for selection on synonymous mutations affecting stability of mRNA secondary structure in mammals
Genome Biology 6:R75.

https://doi.org/10.1186/gb-2005-6-9-r75
- Google Scholar
1. Chen WH
2. van Noort V
3. Lluch-Senar M
4. Hennrich ML
5. Wodke JA
6. Yus E
7. Alibés A
8. Roma G
9. Mende DR
10. Pesavento C
11. Typas A
12. Gavin AC
13. Serrano L
14. Bork P
(2016) Integration of multi-omics data of a genome-reduced bacterium: Prevalence of post-transcriptional regulation and its correlation with protein abundances
Nucleic Acids Research 44:1192–1202.

https://doi.org/10.1093/nar/gkw004
- Google Scholar
1. Cock PJ
2. Antao T
3. Chang JT
4. Chapman BA
5. Cox CJ
6. Dalke A
7. Friedberg I
8. Hamelryck T
9. Kauff F
10. Wilczynski B
11. de Hoon MJ
(2009) Biopython: freely available Python tools for computational molecular biology and bioinformatics
Bioinformatics 25:1422–1423.

https://doi.org/10.1093/bioinformatics/btp163
- Google Scholar
(2009) Global signatures of protein and mRNA expression levels
Molecular BioSystems 5:1512–1526.

https://doi.org/10.1039/b908315d
- Google Scholar
1. Deutscher MP
(2006) Degradation of RNA in bacteria: comparison of mRNA and stable RNA
Nucleic Acids Research 34:659–666.

https://doi.org/10.1093/nar/gkj472
- Google Scholar
1. Diwan GD
2. Agashe D
(2016) The frequency ofinternal shine-dalgarno-like motifs in prokaryotes
Genome Biology and Evolution 8:1722–1733.

https://doi.org/10.1093/gbe/evw107
- Google Scholar
1. Eddy SR
(2011) Accelerated Profile HMM Searches
PLoS Computational Biology 7:e1002195.

https://doi.org/10.1371/journal.pcbi.1002195
- Google Scholar
1. Farh KK
2. Grimson A
3. Jan C
4. Lewis BP
5. Johnston WK
6. Lim LP
7. Burge CB
8. Bartel DP
(2005) The widespread impact of mammalian MicroRNAs on mRNA repression and evolution
Science 310:1817–1821.

https://doi.org/10.1126/science.1121158
- Google Scholar
1. Fisher RA
(1922) On the interpretation of χ2 from contingency tables, and the calculation of P
Journal of the Royal Statistical Society 85:87–94.

https://doi.org/10.2307/2340521
- Google Scholar
1. Gardner PP
2. Daub J
3. Tate J
4. Moore BL
5. Osuch IH
6. Griffiths-Jones S
7. Finn RD
8. Nawrocki EP
9. Kolbe DL
10. Eddy SR
11. Bateman A
(2011) Rfam: Wikipedia, clans and the "decimal" release
Nucleic Acids Research 39:D141–D145.

https://doi.org/10.1093/nar/gkq1129
- Google Scholar
1. Giannoukos G
2. Ciulla DM
3. Huang K
4. Haas BJ
5. Izard J
6. Levin JZ
7. Livny J
8. Earl AM
9. Gevers D
10. Ward DV
11. Nusbaum C
12. Birren BW
13. Gnirke A
(2012) Efficient and robust RNA-seq process for cultured bacteria and complex community transcriptomes
Genome Biology 13:R23..

https://doi.org/10.1186/gb-2012-13-3-r23
- Google Scholar
(2013) Causes and effects of N-terminal codon bias in bacterial genes
Science 342:475–479.

https://doi.org/10.1126/science.1241934
- Google Scholar
(2012) Comparative analysis of RNA families reveals distinct repertoires for each domain of life
PLoS Computational Biology 8:e1002752.

https://doi.org/10.1371/journal.pcbi.1002752
- Google Scholar
1. Ikemura T
(1981) Correlation between the abundance of Escherichia coli transfer RNAs and the occurrence of the respective codons in its protein genes: a proposal for a synonymous codon choice that is optimal for the E. coli translational system
Journal of Molecular Biology 151:389–409.

https://doi.org/10.1016/0022-2836(81)90003-6
- Google Scholar
1. Ioannidis JP
2. Khoury MJ
(2011) Improving validation practices in "omics" research
Science 334:1230–1232.

https://doi.org/10.1126/science.1211811
- Google Scholar
(2009) Coding-sequence determinants of gene expression in Escherichia coli
Science 324:255–258.

https://doi.org/10.1126/science.1170160
- Google Scholar
1. Kwon T
2. Huse HK
3. Vogel C
4. Whiteley M
5. Marcotte EM
(2014) Protein-to-mRNA ratios are conserved between Pseudomonas aeruginosa strains
Journal of Proteome Research 13:2370–2380.

https://doi.org/10.1021/pr4011684
- Google Scholar
1. Lao PJ
2. Forsdyke DR
(2000) Thermophilic bacteria strictly obey Szybalski's transcription direction rule and politely purine-load RNAs with both adenine and guanine
Genome Research 10:228–236.

https://doi.org/10.1101/gr.10.2.228
- Google Scholar
1. Laurent JM
2. Vogel C
3. Kwon T
4. Craig SA
5. Boutz DR
6. Huse HK
7. Nozue K
8. Walia H
9. Whiteley M
10. Ronald PC
11. Marcotte EM
(2010) Protein abundances are more conserved than mRNA abundances across diverse taxa
Proteomics 10:4209–4212.

https://doi.org/10.1002/pmic.201000327
- Google Scholar
1. Li GW
2. Oh E
3. Weissman JS
(2012) The anti-Shine-Dalgarno sequence drives translational pausing and codon choice in bacteria
Nature 484:538–541.

https://doi.org/10.1038/nature10965
- Google Scholar
1. Lindgreen S
2. Umu SU
3. Lai AS
4. Eldai H
5. Liu W
6. McGimpsey S
7. Wheeler NE
8. Biggs PJ
9. Thomson NR
10. Barquist L
11. Poole AM
12. Gardner PP
(2014) Robust identification of noncoding RNA from transcriptomes requires phylogenetically-informed sampling
PLoS Computational Biology 10:e1003907.

https://doi.org/10.1371/journal.pcbi.1003907
- Google Scholar
(2011) ViennaRNA Package 2.0
Algorithms for Molecular Biology 6:26.

https://doi.org/10.1186/1748-7188-6-26
- Google Scholar
1. Lu P
2. Vogel C
3. Wang R
4. Yao X
5. Marcotte EM
(2007) Absolute protein expression profiling estimates the relative contributions of transcriptional and translational regulation
Nature Biotechnology 25:117–124.

https://doi.org/10.1038/nbt1270
- Google Scholar
1. Maier T
2. Schmidt A
3. Güell M
4. Kühner S
5. Gavin AC
6. Aebersold R
7. Serrano L
(2011) Quantification of mRNA and protein and integration with protein turnover in a bacterium
Molecular Systems Biology 7:511.

https://doi.org/10.1038/msb.2011.38
- Google Scholar
(2006) Thermodynamics of RNA-RNA binding
Bioinformatics 22:1177–1182.

https://doi.org/10.1093/bioinformatics/btl024
- Google Scholar
1. Nakahigashi K
2. Takai Y
3. Shiwa Y
4. Wada M
5. Honma M
6. Yoshikawa H
7. Tomita M
8. Kanai A
9. Mori H
(2014) Effect of codon adaptation on codon-level and gene-level translation efficiency in vivo
BMC Genomics 15:1115.

https://doi.org/10.1186/1471-2164-15-1115
- Google Scholar
1. Nawrocki EP
2. Burge SW
3. Bateman A
4. Daub J
5. Eberhardt RY
6. Eddy SR
7. Floden EW
8. Gardner PP
9. Jones TA
10. Tate J
11. Finn RD
(2015) Rfam 12.0: updates to the RNA families database
Nucleic Acids Research 43:D130–D137.

https://doi.org/10.1093/nar/gku1063
- Google Scholar
1. Nawrocki EP
2. Eddy SR
(2013) Infernal 1.1: 100-fold faster RNA homology searches
Bioinformatics 29:2933–2935.

https://doi.org/10.1093/bioinformatics/btt509
- Google Scholar
1. Pain A
2. Ott A
3. Amine H
4. Rochat T
5. Bouloc P
6. Gautheret D
(2015) An assessment of bacterial small RNA target prediction programs
RNA Biology 12:509–513.

https://doi.org/10.1080/15476286.2015.1020269
- Google Scholar
1. Pelletier J
2. Sonenberg N
(1987) The involvement of mRNA secondary structure in protein synthesis
Biochemistry and Cell Biology 65:576–581.

https://doi.org/10.1139/o87-074
- Google Scholar
1. Plotkin JB
2. Kudla G
(2011) Synonymous but not the same: the causes and consequences of codon bias
Nature Reviews Genetics 12:32–42.

https://doi.org/10.1038/nrg2899
- Google Scholar
(2015) Structural insights into the translational infidelity mechanism
Nature Communications 6: 7251.

https://doi.org/10.1038/ncomms8251
- Google Scholar
1. Schmittgen TD
2. Livak KJ
(2008) Analyzing real-time PCR data by the comparative C(T) method
Nature Protocols 3:1101–1108.

https://doi.org/10.1038/nprot.2008.73
- Google Scholar
1. Sharma CM
2. Hoffmann S
3. Darfeuille F
4. Reignier J
5. Findeiss S
6. Sittka A
7. Chabas S
8. Reiche K
9. Hackermüller J
10. Reinhardt R
11. Stadler PF
12. Vogel J
(2010) The primary transcriptome of the major human pathogen Helicobacter pylori
Nature 464:250–255.

https://doi.org/10.1038/nature08756
- Google Scholar
1. Sharp PM
2. Li WH
(1987) The codon Adaptation Index--a measure of directional synonymous codon usage bias, and its potential applications
Nucleic Acids Research 15:1281–1295.

https://doi.org/10.1093/nar/15.3.1281
- Google Scholar
1. Stark A
2. Brennecke J
3. Bushati N
4. Russell RB
5. Cohen SM
(2005) Animal MicroRNAs confer robustness to gene expression and have a significant impact on 3'UTR evolution
Cell 123:1133–1146.

https://doi.org/10.1016/j.cell.2005.11.023
- Google Scholar
(2011) Regulation by small RNAs in bacteria: expanding frontiers
Molecular Cell 43:880–891.

https://doi.org/10.1016/j.molcel.2011.08.022
- Google Scholar
1. Taniguchi Y
2. Choi PJ
3. Li GW
4. Chen H
5. Babu M
6. Hearn J
7. Emili A
8. Xie XS
9. Li G
(2010) Quantifying E. coli proteome and transcriptome with single-molecule sensitivity in single cells
Science 329:533–538.

https://doi.org/10.1126/science.1188308
- Google Scholar
1. Tuller T
2. Zur H
(2015) Multiple roles of the coding sequence 5' end in gene expression regulation
Nucleic Acids Research 43:13–28.

https://doi.org/10.1093/nar/gku1313
- Google Scholar
(2008) Detecting microRNA binding and siRNA off-target effects from expression data
Nature Methods 5:1023–1025.

https://doi.org/10.1038/nmeth.1267
- Google Scholar
(2010) Strand-specific deep sequencing of the transcriptome
Genome Research 20:989–999.

https://doi.org/10.1101/gr.094318.109
- Google Scholar
1. Vogel C
2. Marcotte EM
(2012) Insights into the regulation of protein abundance from proteomic and transcriptomic analyses
Nature Reviews Genetics 13:227–232.

https://doi.org/10.1038/nrg3185
- Google Scholar
1. Waters LS
2. Storz G
(2009) Regulatory RNAs in bacteria
Cell 136:615–628.

https://doi.org/10.1016/j.cell.2009.01.043
- Google Scholar
(2015) High-precision analysis of translational pausing by ribosome profiling in bacteria lacking EFP
Cell Reports 11:13–21.

https://doi.org/10.1016/j.celrep.2015.03.014
- Google Scholar
1. Workman C
2. Krogh A
(1999) No evidence that mRNAs have lower folding free energies than random sequences with the same dinucleotide distribution
Nucleic Acids Research 27:4816–4822.

https://doi.org/10.1093/nar/27.24.4816
- Google Scholar
1. Wu D
2. Jospin G
3. Eisen JA
(2013) Systematic identification of gene families for use as "markers" for phylogenetic and phylogeny-driven ecological studies of bacteria and archaea and their major subgroups
PLoS One 8:e77033.

https://doi.org/10.1371/journal.pone.0077033
- Google Scholar

Article and author information

Author details

Sinan Uğur Umu
1. School of Biological Sciences, University of Canterbury, Christchurch, New Zealand
2. Biomolecular Interaction Centre, University of Canterbury, Christchurch, New Zealand
Contribution
SUU, Conception and design, Acquisition of data, Analysis and interpretation of data, Drafting or revising the article, Contributed unpublished essential data or reagents

Competing interests
The authors declare that no competing interests exist.

"This ORCID iD identifies the author of this article:" 0000-0001-8081-7819
Anthony M Poole
1. School of Biological Sciences, University of Canterbury, Christchurch, New Zealand
2. Biomolecular Interaction Centre, University of Canterbury, Christchurch, New Zealand
Contribution
AMP, Conception and design, Analysis and interpretation of data, Drafting or revising the article

Competing interests
The authors declare that no competing interests exist.
Renwick CJ Dobson
1. School of Biological Sciences, University of Canterbury, Christchurch, New Zealand
2. Biomolecular Interaction Centre, University of Canterbury, Christchurch, New Zealand
3. Department of Biochemistry and Molecular Biology, University of Melbourne, Parkville, Australia
Contribution
RCJD, Conception and design, Drafting or revising the article

Competing interests
The authors declare that no competing interests exist.
Paul P Gardner
1. School of Biological Sciences, University of Canterbury, Christchurch, New Zealand
2. Biomolecular Interaction Centre, University of Canterbury, Christchurch, New Zealand
3. BioProtection Research Centre, University of Canterbury, Christchurch, New Zealand
Contribution
PPG, Conception and design, Acquisition of data, Analysis and interpretation of data, Drafting or revising the article, Contributed unpublished essential data or reagents

For correspondence
paul.gardner@canterbury.ac.nz

Competing interests
The authors declare that no competing interests exist.

"This ORCID iD identifies the author of this article:" 0000-0002-7808-1213

Funding

Royal Society of New Zealand

Sinan Uğur Umu
Anthony M Poole
Renwick CJ Dobson
Paul P Gardner

University of Canterbury (Biomolecular Interaction Centre, Joint PhD Scholarship)

Sinan Uğur Umu

Royal Society of New Zealand (RutherfordDiscovery Fellowships)

Anthony M Poole
Paul P Gardner

Army Research Office

Renwick CJ Dobson

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

Thanks to Grzegorz Kudla for sharing the GFP expression data from Kudla et al. (2009) and Cindy Chang, Emilyn Tan, Michael Nefedov from the RT-PCR and the PEF facilities at the University of Queensland for assistance with generating GFP expression data. We also acknowledge Jeppe Vinther, Lukasz Kielpinski, Anders Krogh and the attendees of the 2012 and 2015 Benasque RNA conference for stimulating discussions.

Copyright

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.