Calculation of translation speed confirms slow initial translation (SIT). Translation speeds were calculated using Ribosome Residence Time (RRT) (Gardin et al. 2014) (Table S1 for RRT values) as a measure of codon-specific translation speed over 5694 S. cerevisiae ORFs. The horizontal line indicates average inverse RRT across all ORFs. The average speed in the first 40 amino acids is about 1.1% slower than in the rest of the gene (p < 0.001).

Codon usage in the Slow Initial Translation (SIT) region. A. Relative usage of each codon in the SIT versus the rest of the gene. The Y-axis shows the usage of each codon in the first 40 amino acids (omitting ATG) divided by its usage in the rest of gene. Each of the 61 sense codons is displayed along the X-axis, grouped by amino acid. Within each group, codons are ordered, left to right, from least frequent to most frequent over the whole transcriptome (i.e., the leftmost codon in each group is the rarest). Red arrows show the seven slowest codons by RRT, purple arrows show the seven rarest codons by total usage over the transcriptome, and blue shows Start and possible Start codons (ATG, ATA, ATC). Ratios above 1 indicate enrichment in the first 40 amino acids. Typically, the rarest codons for each amino acid are enriched. B. Absolute usage of each leucine codon in the SIT. The absolute usage frequency of each leucine codon is shown globally, and for the first 40 amino acids. Rare codons are still rare in the SIT, just not as rare as elsewhere. The same pattern holds for the other amino acids.

The N-termini of proteins can vary in evolution. BLAST of four example S. cerevisiae proteins against proteins in the subphylum “Saccharomycotina” (taxid: 147537) (excluding Saccharomyces, taxid 4930) was performed. Top hits are shown. Red regions indicate homology with an alignment score > 200, while white indicates no detected homology (BLAST default parameters). Even though all hits have high to moderate homology towards the center of the protein, many have little or no homology at the N-terminus.

Conservation of S. cerevisiae proteins over the N-terminal, Middle, and C-terminal 40 amino acids. S. cerevisiae proteins were blasted against proteins of Saccharomycotina (excluding cerevisiae). “Conservation Scores” (Methods and Materials) were calculated for the N-terminal, Middle, and C-terminal 40 amino acids of the S. cerevisiae proteins. Scores range from 0 (no conservation) to 40 (perfect conservation). The frequency of each conservation score (3964 S. cerevisiae proteins) was plotted.

Slow Initial Translation is correlated with poor N-terminal conservation. A. Proteins were grouped by their N-terminal conservation scores (top, middle, and bottom thirds), and then the relative initial translation rate was plotted for each group. More conserved N-termini have faster initial translation. B. Proteins were grouped by their initial translation rate (top, middle, and bottom thirds), and then the N-terminal conservation scores were plotted for each group. Genes with faster initial translation have more conserved N-termini. Relative Initial Translation Speed is the log2 of (average RRT of the first 40 amino acids divided by the average RRT of the rest of the same gene) (Methods and Materials).

Genes with high levels of expression, and high ribosome densities, generally have rapidly-translated N-termini, and high N-terminal conservation scores. A and B. Genes were grouped by expression level (bottom, middle, and top)(except that genes with fewer than 10 read-counts were omitted to reduce noise) (Lipson et al. 2009). In A, the initial translation rate is shown; in B, the conservation scores are shown. The correlation between speed and transcript abundance fails for the bottom third of genes; possibly these are genes expressed at high levels under other conditions (e.g., meiosis and sporulation). C and D. Genes were grouped by ribosome density (Arava et al. 2003) as a measure of intensity of translation. In C, the initial translation rate is shown; in D, the conservation scores are shown. High ribosome density correlates with high initial translation speed and high conservation score.

Slow Initial Translation inhibits gene expression. Left 3 bars. A synthetic GFP was constructed with a leader amino acid sequence that had little effect on GFP. The leader sequence was recoded to give slow (SIT), medium (MIT) or fast (FIT) translation speed over the first 41 amino acids, without chang-ing the amino acid sequence—i.e., the SIT, MIT and FIT had identical amino acid sequences, but different average RRTs. Each construct (SIT, MIT, FIT) was integrated in single copy at the ADE2 locus, and 25 independently-transformed strains were picked, and GFP fluorescence was measured for each, and the RFP-normalized mean was plotted. Numerical values were: SIT, 1.66; MIT, 1.80, FIT, 2.29. GFP was normalized to RFP expressed from the same reporter molecule, but RFP fluorescence hardly changed amongst the transformants, and non-normalized GFP would have given very similar results. Slower initial translation reduced gene expression. Right 3 bars. As above, but a Putative ribosome Collision Site (PCS) (CGA-CGG) was inserted between the leader and the GFP. Again, slower initial translation reduced gene expression. Values were: SIT:PCS, 0.69, MIT:PCS, 0.74, FIT:PCS, 0.99.

Translation speed at 3’ ends. Translation speeds at the 3’ ends of genes were calculated using Ribosome Residence Time (RRT) (Gardin et al. 2014) (Table S1 for RRT values). The average speed over the last 40 amino acids is about 0.1% slower than in the rest of the gene, not statistically significant. The average speed over the last 100 amino acids is about 0.19% slower, which is significantly different (p = 0.028).

Distribution of translation speeds at 5’ and 3’ ends. The distribution of relative translation speeds over 5694 genes is shown for the first and last 40 amino acids. For the 5’ end, 57.2% of genes have relatively slow initial translation, while for the 3’ end, 50.14% of genes have slow terminal translation.

Codon speed and codon usage are correlated. A. Rare codons are translated slowly. Each dot represents a sense codon. The x-axis displays the translation speed of each codon (modified from Gardin et al. 2014); the y-axis displays the global frequency of usage of each codon. The correlation is 0.64, p < 0.001. B. The first 40 codons are enriched for rare codons. Each dot represents a sense codon. The relative usage of each type of codon in the first 40 codons of genes (i.e., in the SIT) (y-axis) is displayed against global codon usage (x-axis). The correlation is −.61, p < 0.001. C. The first 40 codons are enriched for slow codons. Each dot represents a sense codon. The relative usage of each type of codon in the first 40 codons of genes (i.e., in the SIT) is displayed against codon translation speed (i.e., 1/RRT, Gardin et al. 2014). The correlation is −.45, p < 0.001.

Comparison of conservation scores at the N- and C-termini. Grey, N-terminal conservation scores. Red, C-terminal conservation scores.

Slow 3’ Translation is correlated with poor C-terminal conservation. A. Proteins were grouped by their C-terminal conservation scores (top, middle, and bottom thirds), and then the terminal translation rate was plotted for each group. More conserved C-termini have faster terminal translation. B. Proteins were grouped by their C-terminal translation rate (top, middle, and bottom thirds), and then the C-terminal conservation scores were plotted for each group. Genes with faster terminal translation have more conserved C-termini.

Structure of the GFP reporters. These reporters were adapted from Brule and Grayhack, 2016, and Gamble et al., 22016. A leader sequencer (purple), originally from HIS3, is appended upstream of GFP. A. For the first three constructs, recoding of some residues within the first 41 codons with synonymous codons gave either a slow (SIT), medium (MIT), or fast (FIT) initial translation rate. Protein sequences were preserved. B. Three analogous reporters were made with a putative ribosome collision site (PCS) at codon positions 68 and 69 (still upstream of GFP sequences). The PCS was the codon pair CGA-CGG, two rare Arg codons.

Example conservation scores. Scores were calculated as described in Materials and Methods, but in this example, only for a subset of Saccharomycotina. “Total Hits” is the number of different proteins from the sub-phylum Saccharomycotina subset giving a BLAST bitscore of at least 50. “Hits in the first 40 amino acids” is the number of proteins (out of the proteins in the “Total Hits” columns) that had a BLAST alignment with an alignment score >200 matching any part of the first 40 amino acids of the query sequence (i.e., of PCA1, NSR1, etc.). “Query Start” is the range of amino acid positions in the Query protein where the BLAST alignments started. For instance, for BUD5, the 125 Saccharomycotina homologs had BLAST alignments that started at positions between amino acid 211 and amino acid 420 on S. cerevisiae BUD5; none had an alignment starting within the first 40 amino acids. For SNX41, 65 of the 121 hits had an alignment beginning within the first 40 amino acids of S. cerevisiae SNX41. For RPL12B, all 121 of the Saccharomycotina homologs had BLAST alignments starting at amino acid 1 of S. cerevisiae RPL12B. The “Conservation Score” is the score calculated as described in Materials and Methods. Note that the number of hits varies in part because the genomes of the Saccharomycotina species were not all fully sequenced. Thus, BNA2 likely has fifteen fewer hits than TRP3 because the BNA2 locus was not sequenced in some species. However, the number of hits does not affect the conservation score (see Table S2), as long as the number meets the qualifying minimum.

Calculation of a Conservation Score.

“Query start position in BLAST alignment” is the amino acid residue of the S. cerevisiae query protein where a BLAST alignment (alignment score >200) begins with a protein of Saccharomycotina. “Proportion of hits with this Q-Start Position” is the proportion of qualifying Saccharomycotina hits (i.e., bitscore >50) that have their alignment begin at this position. “Weight” is multiplied by “Proportion”, and the sum is the Conservation Score.

Ramp gene sequences.