Figures and data

In vitro construction of minimal, near-standard, and standard genetic codes
(A) Anticodons of tRNAs and their corresponding codon assignments in the minimal genetic code (MGC), the near-standard genetic code (near-SGC), and the standard genetic code (SGC). Each codon is colored according to the physicochemical properties of the assigned amino acid: hydrophobic (green), aromatic (yellow), polar uncharged (orange), basic (blue), and acidic (pink). In each box, the anticodons of tRNAs (left) and the corresponding codons (right) are shown. (B) Codon sets used for reporter genes. The 21 codons contain only codons that are usable for MGC. Reporter genes composed of these codons were used in D and the subsequent experiments using non-standard genetic codes. The 32 and 46 codons contain codons that are usable for near-SGC and SGC, respectively. Reporter genes composed of these codons were used in D. (C) Schematic of the translation assay. Reporter genes (NanoLuc, 1 nM) consisting of the 21, 32, or 46-codon were translated in a customized reconstituted translation system lacking endogenous tRNAs (tRNA-free PURE system; tfPURE) supplemented with in vitro-synthesized tRNAs corresponding to MGC, near-SGC, or SGC (IPEN tRNA at 100 ng/µL; all other tRNAs at 12 ng/µL), and T7 RNA polymerase (0.42 U/µL) at 30 °C for 16 h. (D) NanoLuc activity after incubation. In near-SGC (RV), two tRNAs (tRNAVal and tRNAArg) were increased to 100 ng/µL. Each dot represents an independent experiment (n = 3). Bars indicate mean values, and error bars represent standard deviations. Statistical comparisons in (D) were performed using one-way ANOVA followed by Tukey’s post hoc test on NanoLuc activity; major comparisons are summarized in Table S8.

Reassignment experiments to test the availability of 10 vacant codons for Ala, Ser, and Leu
(A) Schematic illustration of reassignment experiments. Translation with the original MGC and NanoLuc template is shown at the top for comparison. An example of Ala reassignment to the UUG codon is shown at the bottom. In this example, three Ala codons in the NanoLuc sequence were replaced with one type of vacant codon (e.g., UUG), generating a 21 + 1 (UUG-Ala) codon set. Similar reassignment experiments were performed for three amino acids (Ala, Ser, and Leu) and nine vacant codons. Specifically, two Ala codons (Ala16 and Ala120), three Ser codons (Ser31, Ser49, and Ser150), or four Leu codons (Leu32, Leu67, Leu144, and Leu170) were replaced. (B) NanoLuc translation results for each codon reassignment experiment. Translation reactions were performed in tfPURE supplemented with a 21-tRNA mixture (600 ng/µL), one tRNA variant (12 ng/µL each), and each NanoLuc template (1 nM) that contains 2 – 4 of a corresponding codon to be tested (21 + 1 NNN-Ala/Ser/Leu codons). Reactions were incubated at 30 °C for 16 h, after which NanoLuc activity was measured. As a control, translation reactions lacking the additional tRNA variant were conducted (21 code, gray bars) and compared to the data with the additional tRNA (21 +1 code, pink bars). Additional controls included translation without any tRNA (no tRNA) and translation using MGC with NanoLuc templates encoded by the original 21 codons (21 codons), both shown for comparison. Each dot represents three technical replicates, and error bars represent standard deviations. For each template, NanoLuc activity in the 21-code and corresponding 21+1-code conditions was compared using Welch’s t-test on luminescence. Statistical results are summarized in Table S9.

Distribution of mutational costs of reassigned genetic codes
(A) Calculation method of mutational costs for each genetic code based on three physicochemical properties of amino acids. The average change in each of the three physicochemical properties of amino acids upon single-nucleotide substitutions from the 21 codons was calculated (see Methods for details). In the reassigned genetic codes analyzed here, one of three amino acids (Ala, Ser, or Leu) was assigned to each of the nine vacant codons shown in gray, and the costs were calculated for all possible reassignment combinations. (B) Representative comparison of the near-SGC and PRmax code. Codon assignment schemes are shown on the left, and heatmap representations of the assigned amino acid values for polar requirement, molecular volume, and hydropathy index are shown on the right. The corresponding CostPR, CostMV, and CostHI values are indicated above each heatmap. (C) Distributions of mutational costs for each physicochemical property of amino acids. Dashed lines indicate the cost values of 10 genetic codes selected for experimental construction. Red dashed lines indicate the minimum and maximum cost values for each cost definition, and orange lines indicate the cost values of near-SGC. (D, E, F) Genetic codes exhibiting the minimum and maximum mutational costs based on PR (D), MV (E), and HI (F). The physicochemical values of amino acids assigned to each codon are shown as heatmaps.

Translation of random mutagenesis libraries with near-SGC
(A) Schematic overview of the protein activity assay using a random mutation library. Reporter genes composed of the 21 codons were subjected to random mutagenesis by error-prone PCR at different Mn2+ concentrations to generate DNA libraries, as shown in Fig. S7. These libraries (5 nM) were translated using near-SGC, consisting of a 32-tRNA mixture (tRNAIPEN, tRNAValCAC, and tRNAArgCCU at 100 ng/µL; all other tRNAs at 12 ng/µL) in tfPURE, including T7 RNA polymerase (1.7 U/µL) at 30 °C for 16 h, and each protein activity was measured. (B, C, D) Dependence of β-galactosidase (GAL) (B), firefly luciferase (Luc) (C), and mStayGold (mSG) (D) activity on mutation rate. Note that the vertical axis of panel C (Luc) is on a log scale. Each dot represents the results of three technical replicates, and error bars represent standard deviations. The lower x-axis indicates the estimated number of mutations per gene, calculated by multiplying the mutation rate per base by the coding sequence length of each reporter gene. Spearman’s rank correlation coefficients were ρ = −0.90 for GAL, ρ = −1.00 for Luc, and ρ = −1.00 for mSG.

Translation of mutagenized DNA libraries with non-SGCs
(A) Schematic of the experiment for comparing protein activities translated with different genetic codes. Random libraries prepared at low and high mutation rates were translated using either the 10 non-SGCs or the near-SGC (RV). Translation conditions were identical to those described in Fig. 4. (B, D, F) Protein activities of products translated with each genetic code using low- and high-mutation DNA libraries. Activities are shown for β-galactosidase (GAL; B, mutation rate = 2.6 × 10−3 per base), firefly luciferase (Luc; D, mutation rate = 2.7 × 10−3 per base), and mStayGold (mSG; F, mutation rate = 4.8 × 10−3 per base). Quantification of protein synthesis levels for GAL is shown in Fig. S9. (C, E, G) Ratios of protein activity of high-mutation libraries to those of low-mutation libraries, plotted against the corresponding theoretical mutational costs. Data are shown for GAL (C), Luc (E), and mSG (G). Mean values of three technical replicates are shown with standard deviations for GAL. For GAL activity in (B), two-way ANOVA was performed using genetic code and mutation level as factors. Significant main effects of genetic code and mutation level were detected (both p < 0.0001), whereas their interaction was not significant. For (C), (E), and (G), Spearman’s rank correlation analysis was performed between each mutational cost metric and the high-/low-mutation activity ratio. Statistical details are summarized in Table S10.