Introduction

Tea is one of the most popular natural non-alcoholic beverages consumed worldwide. About 2 billion cups of tea are consumed worldwide daily (1). The popularity of tea is determined by its favorable flavor and numerous health benefits (24). The flavor and health-beneficial effects of tea are conferred by the abundant secondary metabolites, including catechins, caffeine, theanine, volatiles, etc (5). Theanine (γ-glutamylethylamide) is a unique non-protein amino acid and the most abundant free amino acid in tea plants (Camellia sinensis). It accounts for more than 50% of the total free amino acids and approximately 1-2% of the dry weight of the new shoots of tea plants (6). Theanine is the secondary metabolite conferring the umami taste of tea infusion and also balances the astringency and bitterness of tea infusion caused by catechins and caffeine (7). It has also many health-promoting functions, such as reducing stress and anxiety, improving mood and cognition, protecting the nervous system, etc (6, 813). Therefore, theanine content is highly correlated with green tea quality (14).

Theanine is synthesized from EA and glutamate (Glu) by theanine synthetase (15, 16). Importantly, the large amount of theanine biosynthesis is determined by the high availability of EA in tea plants (17). Therefore, the evolution of EA biosynthesis in tea plants largely determined the quality formation of tea. EA is synthesized from alanine decarboxylation by alanine decarboxylase (CsAlaDC) (18) (Figure 1A). Indeed, CsAlaDC expression level and catalytic activity largely determine the theanine accumulation level in tea plants (19). As a novel gene, CsAlaDC had not been reported in other plant species before it was identified in tea plants. Previous studies indicated that AlaDC originated from the serine decarboxylase gene (SerDC) by gene duplication in tea plants (20). However, CsAlaDC and CsSerDC specifically catalyze alanine and serine decarboxylation, respectively, despite they share highly conserved amino acid sequences (20) (Figure 1A, 1C). In addition, CsAlaDC exhibited a much lower enzymatic activity compared with CsSerDC. However, the structural basis and key amino acids underlying the evolution of the substrate specificity and enzymatic activity of CsAlaDC are unknown.

Metabolic pathways and phylogeny analysis of SerDC and AlaDC in plants.

(A) The decarboxylation of serine and alanine in plants. SerDC, serine decarboxylase; AlaDC, alanine decarboxylase. (B) Phylogenetic analysis of the SerDC homologous proteins in Algal, Pteridohyta, Bryophyta, Dicots and Monocots. The phylogenetic tree was performed using the neighbor-joining method in MAGA 7.0 software and bootstrap values based on 1000 replicates were displayed at branch points. These proteins were represented by their GenBank accession number. (C) Multiple alignment of the amino acid sequences of AlaDC and SerDCs. Primary (100%), secondary (80%), and tertiary (60%) conserved percent of similar amino acid residues were shaded in deep blue, light blue and cheer red, respectively. The multiple alignment was constructed by MAGA 7.0 software by ClustalW method, and GeneDOC software was used to visualize multiple sequence alignments. “*” indicated amino acid residues mutated only in CsAlaDC.

SerDC belongs to the Group II pyridoxal-5-phosphate (PLP)-dependent decarboxylase superfamily (21). These Group II amino acid decarboxylases produce many bioactive secondary metabolites and signaling molecules (2224). In plants, SerDC catalyzes the biosynthesis of ethanolamine from serine, which was first functionally characterized in Arabidopsis thaliana (25) (Figure 1A). Ethanolamine is an important metabolite in the synthesis of phosphatidylethanolamine and phosphatidylcholine, two main phospholipids involved in maintaining eukaryotic membrane structures in eukaryotic cell membranes (2628). Furthermore, the study has shown that SerDC is essential in the embryogenesis of Arabidopsis (29). SerDC is of vital importance for plant growth and development, but the mechanism of substrate recognition and catalytic activity of SerDC also remains unclear.

In this work, we attempt to understand the mechanism of functional divergence of plant AlaDC and SerDC by analyzing their structural characteristics. Here, we obtained the X-ray crystal structure of CsAlaDC and AtSerDC. According to the crystal structure, we found a distinctive zinc finger structure located at both CsAlaDC and AtSerDC, which has not been identified in any other Group II PLP-dependent amino acid decarboxylases that have been previously characterized. By comparing the substrate binding pockets, we identified Phe106 of CsAlaDC and Tyr111 of AtSerDC as the crucial sites for substrate specificity. By conducting mutation screening based on the protein structures, we identified the amino acids repressing the catalytic activity and discovered that CsAlaDCL110F/P114A exhibited a 2.3-fold increase in catalytic activity compared to that of CsAlaDC and enhanced the engineering of theanine production in vitro.

Results

Enzymatic properties of CsAlaDC, AtSerDC, and CsSerDC

CsAlaDC originates from CsSerDC by gene duplication and neofunctionalization in tea plants, but they catalyze different metabolic processes (Figure 1A). A phylogenetic tree was constructed to further reveal the evolutionary relationship between CsAlaDC and its homologs in Algal, Pteridohyta, Bryophyta, Monocots and Dicots (Figure 1B). Phylogenetic analysis indicated that CsAlaDC is homologous with SerDCs, and its closer relationship was Dicots plants. As expected, CsSerDC was most closed to AtSerDC, which implies that they shared similar functions. However, CsAlaDC is relatively distant from CsSerDC. Next, we performed a multiple sequence alignment of the amino acid sequences of the 6 SerDCs and CsAlaDC (Figure 1C). The results showed that the amino acid sequences of CsAlaDC and SerDCs were highly conserved, but CsAlaDC has amino acid mutations in conserved motifs compared with other SerDCs.

Next, to verify the substrate specificity and enzyme activity, we conducted enzyme activity detection and enzyme kinetics analysis for CsAlaDC, CsSerDC and AtSerDC. The 5′ truncated CsAlaDC and AtSerDC were constructed into the pET22b and pET28a expression vectors to generate recombinant plasmids pET22b-CsAlaDC and pET28a-AtSerDC, correspondingly. Additionally, the full-length protein of CsSerDC was inserted into the pET28a expression vector to generate recombinant plasmid pET28a-CsSerDC. Subsequently, the recombinant proteins were expressed in Escherichia coli and purified via nickel affinity chromatography. The purified proteins were examined through SDS-PAGE analysis (Figure 2A). To investigate the substrate specificity of CsAlaDC, AtSerDC, and CsSerDC, enzyme activity assays were conducted using Ala and Ser as substrates, followed by product identification via UPLC analysis. The results showed that CsAlaDC can effectively participate in the decarboxylation of alanine to generate EA, whereas its catalytic efficacy towards serine is relatively inferior. Conversely, AtSerDC and CsSerDC selectively catalyzed serine decarboxylation to yield ethanolamine but did not exhibit activity towards alanine (Figure 2B). These findings confirmed that CsAlaDC and SerDCs do not have promiscuous decarboxylase activity on Ala and Ser.

Purification and characterization of CsAlaDC, AtSerDC, and CsSerDC.

(A) Identification of CsAlaDC, AtSerDC, and CsSerDC using SDS-PAGE. (B) Detection of enzyme activities of CsAlaDC, AtSerDC, and CsSerDC by UPLC. (C) Reaction rates of substrates with different concentrations catalyzed by CsAlaDC, AtSerDC, and CsSerDC.

The kinetic properties of CsAlaDC, AtSerDC, and CsSerDC were determined through the use of corresponding substrates (Figure 2C), and the results are presented in Table 1. The kinetic parameter Km of CsAlaDC was determined to be 1.215 mmol/L, which is similar to that of AtSerDC and CsSerDC at 1.522 mmol/L and 2.364 mmol/L, respectively. However, AtSerDC has a Vmax of 7.053 μmol/L/s, which is 4-fold that of CsAlaDC’s Vmax of 1.709 μmol/L/s; while CsSerDC has a Vmax of 9.031 μmol/L/s, which is 5.3-fold that of CsAlaDC. These findings indicated that the catalytic efficiency of CsAlaDC is considerably lower than that of both CsSerDC and AtSerDC. Due to the similarity in catalytic activity between CsSerDC and AtSerDC, we chose the more representative AtSerDC for further analysis.

Kinetic parameters of CsAlaDC, AtSerDC and CsSerDC

The overall structures of CsAlaDC and AtSerDC

To enhance our comprehension of CsAlaDC and AtSerDC, we conducted structural analyses of these two proteins. Through optimization of crystallization conditions, we successfully determined the crystal structures of CsAlaDC, as well as the CsAlaDC-EA, and AtSerDC at resolutions of 2.50 Å, 2.60 Å and 2.85 Å, respectively (Supplementary table 1).

The overall architecture of CsAlaDC and AtSerDC comprises homodimeric structures with two subunits exhibiting an asymmetrical arrangement (Figure 3A and 3D). The monomer of CsAlaDC and AtSerDC is divided into three distinct structural domains: an N-terminal domain (N-terminal −104 aa in CsAlaDC, N-terminal −109 aa in AtSerDC), a large domain (105-364 aa in CsAlaDC, 110-369 aa in AtSerDC), and a small C-terminal domain (365 aa-C-terminus in CsAlaDC, 370-C-terminus as in AtSerDC), colored light pink, khaki, and sky blue, respectively. Compared to the full-length protein, the N-terminal structures of CsAlaDC and AtSerDC are truncated by 60 and 65 amino acid residues, respectively. The truncated protein N-terminus contains a long α-helix, which is connected to the large domain through a long loop. The large domain contains a seven-stranded mixed β-sheet surrounded by eight α-helices, while the C-terminal domain composes of a three-stranded antiparallel β-sheet and three α-helices. The catalytic site of the enzyme is positioned within a superficial crevice at the junction of two subunits forming a dimeric arrangement. The cofactor-binding process involves amino acid residues derived from both subunits. One monomer accommodates both PLP and substrate, while the other monomer also contributes to cofactor binding and enzymatic function (Figure 3A and 3D).

Crystal structures of CsAlaDC and AtSerDC.

(A) Dimer structure of CsAlaDC. The color display of the N-terminal domain, large domain, and C-terminal domains of chain A is shown in light pink, khaki and sky blue, respectively. Chain B is shown in spring green. The PLP molecule is shown as a sphere model. The zinc finger structure at the C-terminus of CsAlaDC is indicated by the red box. The gray spheres represent zinc ions, while the red dotted line depicts the coordination bonds formed by zinc ions with cysteine and histidine. (B) The 2Fo-Fc electron density maps of K309-PLP-EA (contoured at 1σ level). The PLP is shown in violet, the K309 is shown in spring green, and the EA is shown in lightblue. (C) Active center of the CsAlaDC-EA complex, with hydrogen bonds denoted by black dotted lines. “*” denotes the amino acids on adjacent subunits. (D) Dimer structure of AtSerDC. The color display of the N-terminal domain, large domain, and C-terminal domains of chain A is shown in light pink, khaki and sky blue, respectively. Chain B is shown in cyan. The PLP molecule is shown as a sphere model. The zinc finger structure at the C-terminus of AtSerDC is indicated by the red box. The gray spheres represent zinc ions, while the red dotted line depicts the coordination bonds formed by zinc ions with cysteine and histidine. (E) Active center of the AtSerDC, with hydrogen bonds denoted by black dotted lines. “*” denotes the amino acids on adjacent subunits. (F) The monomer of CsAlaDC and AtSerDC are superimposed. CsAlaDC is depicted in spring green, while AtSerDC is shown in cyan. The conserved amino acid catalytic ring is indicated by the red box (G) Amino acid residues of the active center in CsAlaDC apo and CsAlaDC-EA complex are superimposed. CsAlaDC apo is shown in floral white, while CsAlaDC-EA complex is shown in spring green. (H) The relative activity of wild-type CsAlaDC and its Y336F mutant (left), as well as wild-type AtSerDC and its Y341F mutant (right) is shown. (I) The effects of DTT on the activity of CsAlaDC and AtSerDC are depicted. Three independent experiments were conducted.

Notably, our investigation has revealed the presence of a distinctive zinc finger structure (as depicted in Figure 3A and 3D) located at the C-terminus of CsAlaDC and AtSerDC. This structure is composed of a ring structure spanning 17 amino acid residues, wherein coordination of Zn2+ is facilitated by three Cys residues and one His residue. Importantly, this particular configuration is exclusive to the two proteins under examination and has not been identified in any other Group II PLP-dependent amino acid decarboxylases that have been previously characterized.

The crystal structures of CsAlaDC-EA complex and AtSerDC were obtained and further analyzed. In the former complex, PLP bound to Lys309 via a Schiff base linkage (internal aldimine form), and the pyridine moiety of PLP is positioned between the imidazole ring of His196 and the methyl group of Ala279 in a parallel orientation to the imidazole ring. The N1 of PLP establishes a salt bridge interaction with Asp277, while the O3 forms hydrogen bonds with Thr247 and Lys309. Additionally, the phosphate group of PLP participates in hydrogen bonding interactions with His308, Gly169, Thr170, Lys309, and Ser347*(“*” denotes the amino acids on adjacent subunits.) (Figure 3C). Similarly, in the latter structure, the pyridine ring of PLP was sandwiched between the imidazole ring of His201 and the methyl group of Ala284. The N1 of PLP formed a salt bridge with Asp282, while the O3 formed hydrogen bonds with Thr252. The phosphate group of PLP established hydrogen bonding interactions with Lys314, His313, Gly174, Thr175, and Ser352* (Figure 3E).

Factors affecting CsAlaDC and AtSerDC activity

The Group II PLP-dependent amino acid decarboxylases are dimeric enzymes with each monomer containing an active site located within a shallow cavity (30). A flexible loop of amino acids originating from one monomer extends into the active site of the other monomer and plays a crucial role in catalysis. The catalytic loop in CsAlaDC, consisting of amino acid residues 328-341, is well-structured and exhibits clear electron density. However, the corresponding loop in AtSerDC is disordered (Figure 3F). The aforementioned loop harbors a conserved Tyr residue that is believed to function as a proton donor of the carbanion in the catalytic process (31). In particular, CsAlaDC Tyr336 and AtSerDC Tyr341 correspond to that Tyr residue. Upon superimposing the active site residues of CsAlaDC apo and CsAlaDC-EA complex, it was observed that the hydroxyl group of Tyr336* in the CsAlaDC-EA complex exhibits a substantial 60-degree deviation along the Cα-Cβ bound (Figure 3G). To assess the function of the Tyr, we substituted the corresponding Tyr residues in CsAlaDC and AtSerDC with Phe. The resulting CsAlaDCY336F and AtSerDCY341F mutants were then exposed to L-alanine and L-serine, respectively. We measured the production of EA or ethanolamine in the reaction mixtures. Our findings indicate that these mutants catalyze abortive decarboxylation, as opposed to wild-type CsAlaDC and AtSerDC, as evidenced by the absence of detectable EA and only a small amount of ethanolamine observed in the reaction mixture (Figure 3H). This result suggested this Tyr is required for the catalytic activity of CsAlaDC and AtSerDC.

CsAlaDC and AtSerDC both possess a C-terminal zinc finger motif whose removal led to protein insoluble when expressed in E. coli. Therefore we used dithiothreitol (DTT) to block the zinc finger structure to evaluate the role of this zinc finger structure in the catalytic activity. The results showed that 5 mM L-DTT reduced the relative activity of CsAlaDC and AtSerDC to 22.0% and 35.2%, respectively, compared to the control (Figure 3I). These results suggested that this characteristic zinc finger motif is probably critical for the protein folding of CsAlaDC and AtSerDC.

Identification of key amino acids for the substrate specificity

Through the superposition of the amino acid residues present in the substrate binding pocket of CsAlaDC and AtSerDC, it was observed that except for Phe at position 106 in CsAlaDC (Figure 4A), the residues were identical. In AtSerDC, this position is occupied by Tyr at 111, suggesting that this specific amino acid residue is related to substrate specificity.

Key amino acid residues for substrate recognition.

(A) Superposition of substrate binding pocket amino acid residues in CsAlaDC and AtSerDC. The amino acid residues of CsAlaDC are shown in spring green, the amino acid residues of AtSerDC are shown in cyan, with the substrate specificity-related amino acid residue highlighted in a red ellipse. (B) Active-site-lining residues from Embryophyta AlaDC were identified. The height of the residue label displays the relative amino acid frequency. (C) Histogram showing the distribution of the number of key motifs. (D) Histogram showing the number of key motifs in different plant orders. (E) Relative enzyme activities of wild-type CsAlaDC and mutant protein CsAlaDCF106Y against Ala substrate (columns 1 and 2), and enzyme activities of wild-type AtSerDC and various AtSerDC mutant proteins against Ala substrate (columns 3-9) are presented. The percentage graph shows the relative activity of each protein compared to wild-type CsAlaDC activity (taken as a 100% benchmark). (F) Relative enzyme activities of wild-type AtSerDC and AtSerDC mutant proteins (columns 1-7) against Ser substrates, and enzyme activities of wild-type CsAlaDC and mutant protein CsAlaDCF106Y against Ser substrates(columns 8, 9) were measured. The percentage graph shows the relative activity of each protein compared to the wild-type AtSerDC (taken as a 100% benchmark). Three independent experiments were conducted. (G) The EA contents of AtSerDC and its mutant AtSerDCY111F in N. benthamiana. (H) The EA contents of CsAlaDC and its mutant CsAlaDCF106Y in N. benthamiana. The significance of the difference (P<0.05) was labeled with different letters according to Duncan’s multiple range test.

To obtain more insights into the role of specific amino acid residue in substrate specificity, we utilized amino acid sequences of AtSerDC to identify all potential serine decarboxylases in Embryophyta and constructed phylogenetic trees (Supplementary figure 1). We further identified conserved motifs in these SerDC homologs in Embryophyta (Figure 4B). Importantly, the Phe106 of CsAlaDC or Tyr111 of AtSerDC locates in a conserved motif (Figure 4B). In this conserved motif, the first two amino acids Y and P are 100% conserved, but the third amino acid is variable and can be Y, T, F, V, A, I, or other amino acids (Figure 4B and C); interestingly, the Phe106 of CsAlaDC and Tyr111 of AtSerDC locates at this variable amino acid position. For these first three amino acids, the YPY was identified in 83.7% of the SerDC homologs, while the YPF motif was found in 2.3% of these SerDC homologs (Figure 4C). These analyses provided more insights into the role of Phe106 of CsAlaDC and Tyr111 of AtSerDC in the substrate specificity.

To verify the role of Phe106 of CsAlaDC or Tyr111 of AtSerDC in substrate specificity, we mutated Phe106 of CsAlaDC into Tyr (CsAlaDCF106Y) and Tyr111 of AtSerDC into Phe (AtSerDCY111F) and tested enzymatic activity in vitro. The results showed that CsAlaDCF106Y totally lost the alanine decarboxylase activity, and AtSerDCY111F gained alanine decarboxylase activity (Figure 4E). We also performed other mutations of Tyr111 of AtSerDC, including AtSerDCY111A, AtSerDCY111I, AtSerDCY111L, AtSerDCY111V and AtSerDCY111W. Within these mutations, only AtSerDCY111I exhibited a low level of alanine decarboxylase activity (Figure 4E). These results verified that the Phe106 of CsAlaDC is required for the alanine decarboxylase activity from serine decarboxylase activity.

On the other side, both CsAlaDC and CsAlaDCF106Y exhibit a very low level of serine decarboxylase activity (Figure 4F). Moreover, compared with AtSerDC, the serine decarboxylase activity of AtSerDCY111F was about 30% of AtSerDC; AtSerDCY111I retained lower than 5% of serine decarboxylase activity of AtSerDC; while other mutations, including AtSerDCY111A, AtSerDCY111L, AtSerDCY111V and AtSerDCY111W abolished the serine decarboxylase activity (Figure 4F). These results suggested the Tyr111 of AtSerDC is also important for the substrate specificity of AtSerDC.

To further verify that Phe106 of CsAlaDC and Tyr111 of AtSerDC were key amino acid residues determining its substrate recognition in planta, Nicotiana benthamiana transient expression system was carried out. To this end, A. tumefaciens strain GV3101 (pSoup-p19), carrying recombinant plasmid, was infiltrated into leaves of 5-week-old N. benthamiana plants, and the pCAMBIA1305 empty vector was used as the control (EV). Relative mRNA levels were detected (Supplementary figure 2), indicating that they had been successfully overexpressed in tobacco. Here, we found that a high level of EA was produced in the CsAlaDC-expressing tobacco leaves; while no EA product was detected in tobacco leaves infiltrated with the mutant CsAlaDCF106Y. In addition, we did not detect EA products in the AtSerDC-expressing tobacco leaves, while a high level of EA was detected in tobacco leaves infiltrated with mutant AtSerDCY111F. As anticipated, EA was not detected in tobacco leaves infiltrated with EV (Figure 4G and H). These results further verified the critical role of Phe106 in the substrate specificity of CsAlaDC.

Key amino acids for the evolution of CsAlaDC enzymatic activity

CsAlaDC and AtSerDC have a high sequence similarity of 74.5% and a nearly identical structure with an RMSD (root mean square deviation) of only 0.77Å for monomer structures. However, the two enzymes catalyze different amino acid decarboxylation reactions, and AtSerDC exhibits significantly higher activity than CsAlaDC (Figure 2B and C). To elucidate this observation, we proposed a prediction test: EA, generated via Ala decarboxylation by CsAlaDC, can be toxic and harmful to plants if accumulated excessively. Thus, during the evolution of plant serine decarboxylase into alanine decarboxylase, the enzyme has evolved not just to alter the substrate preference but also to reduce catalytic activity to control EA production within a suitable range. Based on this hypothesis, we suggest that mutating specific amino acids in CsAlaDC to those corresponding amino acids in SerDCs could enhance its activity, and the results could provide insights into the evolution of CsAlaDC enzymatic activity.

Dimerization is essential to AlaDC/SerDC activities because the active site is composed of residues from two monomers. To identify the amino acids repressing the enzymatic activity of CsAlaDC during evolution from SerDC, we analyzed the crystal structures and the amino acids at the dimer interface between CsAlaDC and AtSerDC. This analysis revealed that the amino acid residues at positions 66, 97, 110, 114, 116, 117, 122, 315 and 345 are different in CsAlaDC and SerDCs (Figure 1C). Therefore, we mutated these amino acids of CsAlaDC into those in the corresponding positions of AtSerDC and CsSerDC, and performed enzyme activity assays (Figure 5A). The results demonstrated that CsAlaDCL110F and CsAlaDCP114A exhibited significantly enhanced enzyme activity compared to the wild-type CsAlaDC, with a 2.1-fold and 1.59-fold increase, respectively. Furthermore, the catalytic activity of the CsAlaDCL110F/P114A double mutant exhibits a remarkable 2.3-fold enhancement compared to that of the wild-type protein (Figure 5B). These findings suggested a critical role of Leu110 and Pro114 of CsAlaDC in the evolution of enzymatic activity and provided a basis to improve CsAlaDC activity. It is possible that these amino acid residues could potentially augment the hydrophobic nature of the protein dimer interface.

Mutations enhance CsAlaDC enzyme activity and theanine synthesis in vitro.

(A) Relative enzyme activities of CsAlaDC mutant proteins against Ala substrate. (B) Relative enzyme activities of CsAlaDCL110F, CsAlaDCP114A, and CsAlaDCL110F/P114A against Ala substrate. (C) Histogram showing the relative content of theanine resulting from different combinations of alanine decarboxylase and theanine synthetase. Three independent experiments were conducted.

In Vitro synthesis of theanine

The biosynthetic pathway of theanine in tea plants comprises two sequential enzymatic steps: alanine decarboxylase mediates alanine decarboxylation to yield EA, which is subsequently utilized by theanine synthetase to effectuate the condensation reaction with glutamate to synthesize theanine. By conducting mutation screening, we discovered a CsAlaDC mutant protein (L110F/P114A) that exhibited a 2.3-fold increase in catalytic activity compared to the wild-type protein. Subsequently, we employed an in vitro theanine synthesis system utilizing CsAlaDC and either glutamate synthetase (PtGS [Pseudomonas syringae pv. syringae]) or gamma-glutamate methylamine ligase (GMAS [Methylovorus mays]) which have the activity to synthesize theanine from EA and Glu (32, 33).

The results illustrated the successful synthesis of theanine using CsAlaDC in conjunction with two theanine synthetases, employing Ala and glutamate as substrates (Figure 5C). The theanine content generated by the combination of CsAlaDCL110F and GMAS, as well as CsAlaDCL110F/P114A and GMAS, are 4.57-fold and 6.72-fold higher than the content produced in the combination of wild-type CsAlaDC and GMAS, respectively. Similarly, when combined with the PsGS, comparable outcomes are observed as well. The theanine content resulting from the combination of CsAlaDCL110F and PsGS, as well as CsAlaDCL110F/P114A and PsGS, exhibit enhancements of 1.62-fold and 4.33-fold compared to the wild-type protein combination, respectively. (Figure 5C). Hence, the utilization of CsAlaDCL110F/P114A could effectively enhance the theanine production yield and thus, holds potential for large-scale engineering production of theanine.

Discussion

Comparison of CsAlaDC and AtSerDC with other decarboxylases

The Dali search (34) revealed that AtSerDC and CsAlaDC exhibit structural similarity to several other decarboxylases, including 7CIG (MetDC [Streptomyces sp. 590]), 7ERV (HisDC1 [Photobacterium phosphoreum]), 6KHO (TrpDC [Oryza sativa Japonica Group]), 4E1O (HisDC2 [Homo sapiens]), 6JY1 (AspDC [Methanocaldococcus jannaschii]), and 5GP4 (GluDC [Levilactobacillus brevis]). The monomers of these amino acid decarboxylases all consist of three characteristic domains (Figure 3 and Supplementary figure 4). The C-terminal domain, also known as the small domain, is composed of three antiparallel β-sheets and three α-helices, facing toward the opposite monomers. The large domain contains seven stranded β-sheets surrounded by eight α-helices, exhibiting characteristics typical of Type I enzymes. This domain includes a conserved PLP-binding active center and amino acid residues that are conserved across species. The N-terminal domain consists of elongated α-helices and is connected to the large domain via a loop. The helix of one subunit is antiparallel to the corresponding helix in the neighboring subunit, forming a clamp. Therefore, the N-terminal domain may not be an independent folding unit and is likely only stable in dimers.

The monomeric structure of CsAlaDC and AtSerDC exhibits numerous structural features common to other Group II PLP-dependent amino acid decarboxylases, and the amino acid residues that bind to PLP cofactors are conserved across multiple enzymes (Supplementary figure 3 and 4). Out of the eight proteins analyzed, a total of thirteen amino acid residues were found to be conserved across all sequences (Glu143, Glu171, Lys201, Gly245, Thr247, Asp253, His275, Asp277, Ala279, Pro286, Ser302, Lys309, and Tyr336 in CsAlaDC). Some of them are situated within the active center and play a role in stabilizing PLP or facilitating catalytic reactions. Conversely, other residues reside outside the active center and their function remains unclear. (Supplementary figure 3). PLP is bound to conserved lysine residues in the active center through the Schiff base. The carboxylic group of conserved Asp residues stabilizes protonated pyridine nitrogen of PLP by electrostatic action, providing the latter with strong electrophilicity to stabilize carbanion intermediates during enzymatic catalysis. Another conserved His residue is situated laterally to the PLP, while on the opposite side, there is a conserved Ala residue. The oxygen atom at position 3 of PLP is stabilized by hydrogen bonding with the Thr residue. The phosphate moiety of PLP is additionally secured to the protein through an elongated system of hydrogen bonding. The α5 helix located within the large domain exhibits evolutionary conservation in Group II enzymes, facilitating interaction with the PLP phosphate moiety and conferring appropriate stabilization thereof.

The superimposition of the crystal structures of CsAlaDC and AtSerDC with those of related enzymes suggests that residue Phe106 in CsAlaDC (corresponding to Tyr in AtSerDC and Glu in MetDC) may play a role in determining substrate specificity. This hypothesis was tested via mutagenesis experiments, which revealed that substitution of Phe106 for Tyr rendered CsAlaDC inactive, while substitution of Tyr111 for Phe in AtSerDC enabled alanine decarboxylase activity. Likewise, the substitution of Glu64 for Ala in MetDC reduced enzyme activity by approximately 50-fold compared to the wild-type (35), supporting this conclusion.

Group II amino acid decarboxylase is characterized by the existence of highly flexible catalytic rings, which are of great significance for the catalytic mechanism of decarboxylase (31). The catalytic rings are located at the dimer interface and extend to the active sites of other monomers in a closed conformation. In the structure of CsAlaDC, the catalytic ring is orderly and allocated, while in the structure of AtSerDC, it is disordered and lacks electron density (Figure 3F). Prior research has established that the conserved amino acid residue Tyr within the ring plays a crucial role in catalysis by donating protons to carbanions of quinone intermediates that arise following decarboxylation (36, 37). Our experiments demonstrate that the substitution of the corresponding Tyr with Phe in the loop renders the protein inactive. Furthermore, alterations in the structure of CsAlaDC when combined with EA provide further evidence supporting this conclusion. Specifically, the structure of CsAlaDC-EA complex displayed a significant conformational change in Tyr336* located in the catalytic site (as depicted in Figure 3G). Comparing the structure of CsAlaDC apo, the CsAlaDC-EA complex exhibited a notable 60-degree shift in the p-hydroxyl group of Tyr336* along the Cα-Cβ bond.

The monomeric configuration of CsAlaDC and AtSerDC is akin to that of other Group II amino acid decarboxylases, except for a conspicuous dissimilarity. Unlike other Group II amino acid decarboxylases, the C-terminal of CsAlaDC and AtSerDC harbors an evident zinc finger structure. We propose that this structure could potentially influence enzyme stability. To test this hypothesis, we truncated the zinc finger structure from both proteins and expressed them. Our findings indicate that, following the excision of the zinc finger structure, CsAlaDC became insoluble, while AtSerDC displayed a similar trait to a certain extent.

Evolution of AlaDC and SerDC

The sequence identity between CsAlaDC and AtSerDC is 74.5%, indicating a significant degree of homology. Moreover, the two proteins exhibit a high level of monomer structural similarity. Notably, they differ in substrate specificity. Using the crystal structure of CsAlaDC and AtSerDC as a basis, we identified pivotal amino acid residues that govern the substrate specificity of serine or alanine decarboxylase. We observed that sequences proximal to these sites exhibit a high degree of conservation in both proteins and possess characteristic motifs (Figure 4B). Moreover, we analyzed serine decarboxylase-like proteins in Embryophyta that are highly homologous to CsAlaDC and constructed phylogenetic trees to gain insight into their evolutionary relationships. Based on our experimental findings and evolutionary evidence, we have identified a conserved YPX motif at the substrate binding pocket. For most plants, the residue X in this motif is predominantly Tyr, which corresponds to the serine decarboxylase protein. However, other residues such as Thr, Phe, Val, Ala, and Ile can also occupy this position (Figure 4B and C). Interestingly, serine decarboxylase-like proteins containing YPF motifs, which have the potential to function as alanine decarboxylases, were found to be distributed throughout the phylogenetic tree, including Asteroideae, Ericales, Chenopodiaceae, Poaceae, and Ranunculales (Supplementary figure 1B). However, these proteins were absent in some more recent species. This observation implies that the emergence of alanine decarboxylase in these particular species could be attributed to convergent evolution.

Moreover, we have observed the enrichment of serine decarboxylase-like proteins that possess a YPT motif within the Fabales. Prior research has demonstrated that XP_004496485-1, which possesses a YPT motif, displays catalytic efficacy toward the processes of decarboxylation and oxidative deamination of Phe, Met, Leu, and Trp (38). Given that the YPT motif is highly conserved and widely distributed in Fabales, serine decarboxylase-like proteins bearing the YPT motif may have developed a unique substrate specificity in Fabales, beyond their conventional decarboxylation functions, as exemplified by XP_004496485-1 protein mentioned above. We speculate that these proteins may be capable of catalyzing other reactions, potentially involving non-protein amino acids or other substances as substrates.

Applied to improve the synthesis of theanine

Theanine is an important indicator of green tea quality. Therefore, improving the synthesis of theanine in tea plants is the focus of research. In this study, through crystal structure analysis and mutation verification, we found that the catalytic activity of CsAlaDCL110F/P114A is 2.3 times higher than that of the wild-type protein, resulting in a more abundant synthesis of theanine in vitro. This gives us great inspiration to improve the synthesis of EA in tea plants by gene editing, thus increasing the content of theanine in tea plants (Figure 5).

Theanine is highly demanded, by the market, due to its health effects and medicinal value, and as a constituent in food, cosmetics and in other fields. To meet the market demand, a variety of methods have been used to acquire theanine, with the main methods including direct extraction, chemical synthesis, biotransformation (microbial fermentation) and plant cell culture. Based on this study, we can use mutation technology to modify engineering bacteria and improve the possibility of synthesis of theanine in bacteria, yeast, model plants, etc, promoting its use for industrial theanine production.

Conclusions

In summary, we determined the crystal structures of the CsAlaDC-EA complex and AtSerDC. We employed structural biology to reveal the substrate selection mechanism of these two decarboxylases. By structural analysis and enzyme activity experiments, we found that Tyr111 of AtSerDC and Phe106 of CsAlaDC as crucial sites for substrate specificity. Next, through mutation screening, we identified a CsAlaDC mutant protein (L110F/P114A) with catalytic activity 2.3 times higher than that of the wild-type protein. In an in vitro L-theanine biosynthesis system with added PsGS or GMAS, we demonstrated that the L-theanine yield of the CsAlaDCL110F/P114A group was 4.3 and 6.7 times higher than that of the wild-type control group, respectively. In addition, by utilizing the substrate recognition key motif of CsAlaDC and AtSerDC, evolutionary analysis was conducted on protein sequences exhibiting high homologous to CsAlaDC in Embryophyta, which resulted in the discovery of 13 potential alanine decarboxylases. Furthermore, we discovered serine protease-like proteins containing special motifs that have evolved in Fabale, which may possess unique substrate specificity and catalytic functions. On the other hand, our study has effectively enhanced the efficiency of the initial stage of theanine synthesis and is anticipated to offer a promising avenue for developing a novel approach to producing L-theanine, thereby facilitating the progress of the tea industry.

Materials and Methods

Plants materials

Tobacco (Nicotiana benthamiana) plants were grown in a controlled chamber, under a 16-h light and 8-h dark photoperiod at 25 ℃. Leaves of 5-week-old tobacco plants were used for transient transformation, mediated by Agrobacterium tumefaciens strain GV3101.

Gene cloning and protein expression

The coding sequences of AtSerDC and CsAlaDC, amino acid residues 66-482 and 61-478, respectively, were amplified using PCR and then ligated into the Nde Ι and Xho I restriction sites of the pET-28a and pET-22b vector, respectively, providing the recombinant vector pET-28a-AtSerDC and pET-22b-CsAlaDC. The cDNAs encoding CsSerDC were amplified using PCR and then ligated into the Nde Ι and Xho I restriction sites of the pET-28a vector. Gene-specific primers were listed in Supplementary table 2. The recombinant plasmid was transformed into E. coli BL21 (DE3) competent cells. Positive transformants were grown in a 5 mL LB medium containing 30 μg/mL kanamycin or 50 μg/mL ampicillin at 37 °C overnight and then subcultured into an 800 mL LB medium containing the corresponding antibiotic. Protein expression was induced by the addition of 0.2 mM isopropyl-β-d-thiogalactoside (IPTG) for 20 h at 16 ℃ when the optical density (OD) at 600 nm reached 0.6-0.8, harvested by centrifugation at 4 °C and 4,000 rpm for 30 min.

Protein production and crystallization

The cell pellet was suspended in 30 mL of lysis buffer (20 mM HEPES, pH 7.5, 200 mM NaCl, 0.1 mM PLP), disrupted by High-Pressure Homogenizer, and then centrifuged at 16,000 rpm for 30 min at 4 °C to remove the cell debris. The supernatants were purified with a Ni-Agarose resin column followed by size-exclusion chromatography. Before crystallization, purified proteins were concentrated at 10 mg/mL. Crystallization conditions were screened by the sitting-drop vapor diffusion method using the reservoir solutions supplied in commercially available screening kits (Crystal Screen, Crystal Screen 2, PEGRx 1, 2, and SaltRx 1, 2). A droplet made by mixing 1.0 μL of purified AtSerDC or CsAlaDC (10 mg/mL) with an equal volume of a reservoir solution was equilibrated against 100 μL of the reservoir solution at 16℃. The crystal of AtSerDC was obtained using buffer pH 8.0 containing 20% (w/v) PEG400 as a precipitate and 0.2 M CaCl2. The crystal of the CsAlaDC-EA complex was obtained at pH 7.5 containing 2.6 M sodium acetate. The crystal of CsAlaDC was obtained at pH 6.0 containing 3.5 M sodium formate.

Data collection and processing

Crystals were grown by sitting-drop vapor diffusion method at 16 °C. The volume of the reservoir solution was 100 µL and the drop volume was 2 µL, containing 1 µL of protein sample and 1 µL of reservoir solution. The reservoir solution of AtSerDC contained 0.1 M Tris-HCl (pH 8.0), 0.2 M CaCl2, and 20% PEG 400. The reservoir solution of CsAlaDC contained 0.1 M HEPES (pH 7.5), 2.6 M sodium acetate or 0.1 M Bis-Tris (pH 6.0), and 3.5 M sodium formate. Crystals grew in 3-5 days using a protein concentration of 10 mg/mL. Diffraction data were collected at Shanghai Synchrotron Radiation Facility (China). The collected data sets were indexed, integrated, and scaled using the HKL3000 software package. The structure of AtSerDC was solved by molecular replacement using the structure of HisDC (PDB code: 7ERV [Photobacterium phosphoreum]) as the model, and utilizing AtSerDC as a molecular displacement model for both CsAlaDC and the CsAlaDC-EA complex. The AtSerDC, CsAlaDC, and CsAlaDC-EA complex exhibit resolutions of 2.85 Å, 2.50 Å, and 2.60 Å, respectively. The statistics for data collection and processing are summarized in Supplementary table 1.

Enzyme activity assays

Decarboxylase activity was measured by detecting products (EA or ethanolamine) in Waters Acquit ultraperformance liquid chromatography (UPLC) system (20, 39). The 100 μL reaction mixture, containing 20 mM substrate (Ala or Ser), 100 mM potassium phosphate, 0.1 mM PLP, and 0.025 mM purified enzyme, was prepared and incubated at standard conditions (45 °C and pH 8.0 for CsAlaDC, 40 °C and pH 8.0 for AtSerDC for 30 min). Then, the reaction was stopped with 20 μL of 10% trichloroacetic acid.

The detection methodology for theanine production remains consistent with the aforementioned approach. The 100 μL reaction mixture, containing 20 mM Ala, 45 mM Glu, 50 mM HEPES (pH 7.5), 0.1 mM PLP, 30 mM MgCl2, 10 mM ATP, 0.03 mM PsGS/GMAS, and 0.025 mM CsAlaDC/CsAlaDC L110F/CsAlaDC L110F/P114A, was prepared and incubated at standard conditions for 1 h. Subsequently, the reaction was terminated via immersion of the reaction vessel in a metal bath at 96 °C for 3 min.

The product was derivatized with 6-aminoquinolyl-N-hydroxy-succinimidyl carbamate (AQC) and subjected to analysis by UPLC. All enzymatic assays were performed in triplicate.

Site-directed mutagenesis

Site-directed mutagenesis experiment was conducted using a PCR method from the wild-type construct pET-28a-AtSerDC and pET-22b-CsAlaDC, respectively. Dpn I endonucleases were used to digest the parental DNA template. The reaction mixture was used to transform E. coli DH5α competent cells and the plasmids from positive strains were extracted to E. coli BL21 (DE3) for protein expression, purification, and analyzing the enzymatic activity.

In vivo enzyme activity assay in N. benthamiana

The amplified PCR products were fused to the plant expression vector, pCAMBIA1305. Linearization was conducted by restriction digest with Spe I and BamH I. The recombinant colonies were selected for PCR validation on the appropriate antibiotics plate. After validation, the plasmids were electroporated into Agrobacterium tumefaciens strain GV3101(pSoup-p19). Empty pCAMBIA1305 vector, with an intron containing the GFP gene, was treated as the control.

Agrobacterium transient expression assays were performed on 5-week-old N. benthamiana plants. Agrobacterium tumefaciens strain GV3101 (pSoup-p19), carrying the above-described vectors, were cultured in Luria Bertani (LB) medium, containing appropriate antibiotics, at 28 °C. When the absorbance of bacteria colonies reached OD600 = 0.6-0.8, bacterial cells were collected and resuspended in MMA solution (10 mM MgCl2, 10 mM 2-(N-morpholino) ethane sulfonic acid (MES), pH 5.6). After the OD600 of the resuspended bacterial solution was adjusted to approx. 1.0, acetosyringone (AS) was added with a final concentration of 200 µM, and this solution was then incubated, at room temperature, for at least 3 h in darkness. Next, cell suspensions were infiltrated into N. benthamiana leaves with a needle-free syringe. These N. benthamiana leaves were then collected 3 days post infiltration, frozen in liquid nitrogen and stored at −80 °C. Internal EA in N. benthamiana leaves was extracted as previously described (22), and then the solvent was subjected to gas chromatography-mass spectrometry GS-MS system.

Transcript level analysis in N. benthamiana

Total RNA was isolated from samples using the RNAprep Pure Plant Kit (Tiangen, Beijing, China), according to the manufacturer’s protocol. The cDNAs were synthesized using TransScript One-Step gDNA Removal and cDNA Synthesis SuperMix Kit (TransGen Biotech, Beijing, China). The qRT-PCR assays were performed, as previously described (40). Primers used for qRT-PCR assays were listed in Supplementary Table 3 and the qRT-PCR was run on a Bio-Rad CFX96TM RT PCR detection system and CFX Manager Software. Each reaction reagent (20 μL) contained 0.4 μL forward and reverse primers (10 μM), 2 μL cDNA (200±5 ng/μL), 10 μL SYBR Green Supermix (Vazyme, Nanjing, China) and 6.2 μL double-distilled water. Reaction was performed by a two-step method: 95 ℃ for 5 min; 40 cycles of 95 ℃ for 10 s; and 60 ℃ for 30 s. The glyceraldehyde-3-phosphate dehydrogenase (GAPDH) gene was used for internal normalization in each RT-PCR, and the 2−ΔΔCT method (41) was used to calculate the relative gene expression. All samples were performed with three replicates.

Multiple sequence alignment

MUSCLE was used to generate the protein multiple sequence alignment of amino acid decarboxylases with default settings (42). ESPript 3.x was used to display the multiple sequence alignment (43).

Phylogenetic analysis

The CsAlaDC was used as a query for searching its homologs in Embryophyta using BLASTP (44) (version 2.13.0). To fetch homologs of CsAlaDC in distant species, we performed a two-round blast. We first search for the best hit in the given order under Embryophyta. Then we blast each species under a provided order using the order best hit as a query and as many as 10 hits were kept. The sequences with e values <= 0.001 were removed. We further dropped sequences with lengths less than 200 or larger than 800 amino acids. Then multiple sequence alignment was performed via clustalo (version 1.2.4) with default settings on these filtered sequences (45). The NJ algorithm implemented in clustalw (version 2.1) was adopted to build the phylogenetic tree (46). The R package ggtree (version 3.2.1) and universalmotif (version 1.12.4) were used for tree and motif visualization (47).

Acknowledgements

We thank the team of beamline BL18U1 in the Shanghai Synchrotron Radiation Facility for diffraction data collection.

Funding

This study is supported by grants from the Ministry of Science and Technology of China (2019YFA0904100 to W.G, 2022YFF1003103 to X.W) and the Natural Science Foundation of China (T2221005 to W.G, 32072624 to Z.Z).

Data availability

The structures of AtSerDC, CsAlaDC-EA complex and CsAlaDC have been deposited in the Protein Data Bank (PDB) under accession numbers 8JG7, 8JIK and 8JIJ respectively.

Supplementary figures

Evolutionary analysis of CsAlaDC in Embryophyta.

(A) The presented diagram depicts an evolutionary tree of CsAlaDC, which is devoid of a root and solely portrays the topological structure of the tree without including distance information. The color of the inner ring corresponds to various orders, while the outer ring’s leaf nodes are colored based on the motif types that the sequence exhibits. (B) Diversity of serine decarboxylase-like proteins in Embryophyta (196 species). Colored scatter spots on the right side of leaf nodes correspond to the respective motifs shown in Figure A.

The relative mRNA levels of AtSerDC and its mutant Y111F (A), CsAlaDC and its mutant F106Y (B) in N. benthamiana leaves were measured by two primers. WT, wild type of N. benthamiana; EV, empty vector control; NbGAPDH was used as an internal control. Data represent mean ± SD (n=3). The significance of the difference (P<0.05) was labeled with different letters according to Duncan’s multiple range test.

Multiple sequence alignment of CsAlaDC, AtSerDC, MetDC, HisDC1, TrpDC, HisDC2, TyrDC, and GluDC were generated using MUSCLE and visualized with ESPript 3.x. Conserved amino acid residues in all eight proteins are highlighted with red backgrounds. The magenta box marks key amino acid residues involved in substrate recognition for CsAlaDC and AtSerDC. The Lys residue covalently bound to the PLP cofactor is denoted by a red star, while the green triangle indicates the Tyr residue associated with enzymatic activity. Amino acid residues involved in the CsAlaDC substrate binding pocket are marked with blue circles.

Structures of HisDC2, MetDC, TrpDC, AspDC, HisDC1 and GluDC.

(A) The Overall Structures of HisDC2, MetDC, TryDC, AspDC, HisDC1 and GluDC. Chain A is shown in khaki, chain B is shown in cyan. (B) Amino acid residues in the substrate binding pocket of HisDC2, MetDC, TryDC, AspDC, HisDC1 and GluDC. The amino acid residues in chain A are shown in khaki, and the amino acid residues in chain B are shown in cyan.

Supplementary tables

Data collection and refinement statistics

Primers used for gene cloning.

Primers used for real-time PCR.