Chemical structures of lysine lipids. Lys-PG, lysyl-phosphatidylglycerol; Lys-Glc-DAG, lysylglucosyl-diacylglycerol; Lys-Glc2-DAG, lysyl-diglucosyl-diacylglycerol.

Summary of different MprF variants expressed in S. mitis and the lysine lipids they produce.

Percentage of amino acid identity and similarity compared to S. agalactiae COH1 MprF; data obtained from BLASTp. The lipids each strain synthesize are denoted by a checkmark or an x. Lys-PG, lysyl-phosphatidylglycerol; Lys-Glc-DAG, lysyl-glycosyl-diacylglycerol.

Synthesis of lysine lipids (Lys-PG and Lys-Glc-DAG) in S. mitis expressing mprFs from S. agalactiae, S. salivarus, and S. ferus. (a) S. mitis NCTC12261 with empty vector control (pABG5) lacks lysine lipids; (b) S. agalactiae mprF (pGBSMprF) produces both Lys-PG and Lys-Glc-DAG; (c) S. salivarius mprF produces only Lys-PG; (d) S. ferus mprF produces only Lys-Glc-DAG. Left panels: total ion chromatograms (TIC); middle panels: mass spectra of retention time 19.5–21.5 min showing Lys-PG and PC; right panels: mass spectra of retention time 26–30 min showing Lys-Glc-DAG. Note: “*” is an extraction artifact due to chloroform used. DAG, diacylglycerol; MHDAG, monohexosyldiacylglycerol; DHDAG, dihexosyldiacylglycerol; PG, phosphatidylglycerol; Lys-PG, lysyl-phosphatidylglycerol; Lys-Glc-DAG, lysyl-glucosyl-diacylglycerol; PC, phosphatidylcholine.

Example of hidden unit analysis and usage. (a) The structure of PDB:7DUW, with the red colored region being the transmembrane flippase domain and the yellow boxed region the cytosolic domain which we focus on. (b) The scores produce by inputting a sequence into a hidden unit, producing a single number as output which corresponds to a summation of negatively and positively weighted residues. Performed on entire training set (histogram in blue), highlighting sequences corresponding to predominantly positive weighted residues. (c) Hidden unit from an RBM trained on the Pfam DUF2156 domain. The MSA positions 152 and 212 correspond to residues S684 and R742, respectively. (d) Residues (in yellow) in the MprF cytosolic domain which form the binding pocket for Lys-tRNALys (the ligand analogue L-lysine amide shown in green), from PDB:4V36. LYN, L-lysine amide.

Schematic of the RBM methodology. An aligned set of protein sequences is first used to learn a hidden unit representation that best describes the statistics of the sequence dataset given restrictions on the hidden unit representation. Then, the individual hidden units can be studied to find particular units which allow useful enzyme classification, and additionally, these weights can be meaningfully interpreted as statistically co-varying sequence configurations. Additionally, the classification can be used to create filtered datasets to train more models.

Proposed set of weights for classifying lipid specificity. (a) two hidden units found in an RBM trained on the filtered Pfam dataset. The hidden unit residues are highlighted in the PDB:4V34 structure, with arrows pointing to their corresponding residue sets. (b) the outputs of the weights when scoring the sequences used in the training set and sequences from NCBI not used during training (N=23,138). S. agalactiae produces Lys-Glc-DAG and Lys-PG, while B. licheniformis and P. aeruginosa produces only aminoacylated-PG. Q1-Q4 are quadrant labels which we refer to throughout the paper.

Identification of Lys-Glc2-DAG in E. dispar. (a) Positive ion mass spectrum of Lys-Glc2-DAG species in E. dispar. (b) MS/MS product ions and fragmentation scheme of Lys-Glc2-DAG. (c) Extracted ion chromatograms of LC/MS of lysine lipids (Lys-PG, Lys-Glc-DAG, Lys-Glc2-DAG) and Glc2-DAG separated on an amino HPLC column. Glc2-DAG, diglucosyl-diacylglycerol; Lys-PG, lysyl-phosphatidylglycerol; Lys-Glc-DAG, lysyl-glucosyl-diacylgylcerol; Lys-Glc2-DAG, lysyl-diglucosyl-diacylglycerol.

Table of all strains studied, the quadrant they occupy, and the lipids they synthesize.

** trace amounts Lys-Glc-DAG present in lipid extractions. * indicates heterologous expression of mprF in Streptococcus mitis. Lys-Glc-DAG, lysyl-glucosyl-diacylglycerol; Lys-Glc2-DAG, lysyl-diglucosyl-diacylglycerol; Lys-PG, lysyl-phosphatidylglycerol.

E. faecalis MprF2 and E. faecium MprF1 confer Lys-Glc2-DAG synthesis. (a) OG1RF-WT; (b) OG1RF 10760::Tn lacks lysine lipids; (c) OG1RF 10760::Tn + pABG5 lacks lysine lipids; (d) OG1RF 10760::Tn + pOGMprF2 restores lysine lipids; (e) OG1RF 10760::Tn + pEFMprF1 restores lysine lipids; (f) OG1RF 10760::Tn + pEFMprF2 lacks lysine lipids. Expression of OGMprF2 and EFMprF1 in OG1RF 10760 Tn mutant restore Lys-Glc2-DAG synthesis. Shown are the extracted ion chromatograms of lysine lipids (Lys-PG, Lys-Glc2-DAG) and Glc2-DAG separated on an amino HPLC column. Note: Lys-Glc-DAG was found in trace amounts or missing from lipid extractions. Glc2-DAG, diglucosyl-diacylglycerol; Lys-PG, lysyl-phosphatidylglycerol; Lys-Glc2-DAG, lysyl-diglucosyl-diacylglycerol.

The two proposed hidden unit outputs with all of the sequences from Table 2 labeled. Protein ID of highlighted sequences are listed in Table S1. (#) indicates the lipid activity was confirmed through heterologous expression.

Assessment of RBM weight’s reliance on total sequence identity in classification. A sliding window average is performed, where a window of width 1 is slid from the negative end to the positive of the weight, where at each position sequence coordinates are sampled from the window. These sampled sequences are compared pairwise between themselves, computing the Hamming distance across their entire sequence length to produce an average. This sampling procedure is repeated 30 times for each window, with the light blue shading representing the 95% confidence interval.

Identification of Lys-Glc2-DAG which is highly retentive on a silica HPLC column. A) the total ion chromatogram of normal phase LC/MS of E. dispar lipids separated on a silica HPLC column. B) the positive ion mass spectrum of Lys-Glc2-DAG eluting at the end of the LC gradient. Glc-DAG, glucosyl-diacylglycerol; Glc2-DAG, diglucosyl-diacylglycerol; PG, phosphatidylglycerol; Lyso-Glc2-DAG, lyso diglucosyl-diacylglycerol; Lys-PG, lysyl-phosphatidylglycerol; Lys-Glc-DAG, lysyl-glucosyl-diacylglycerol; Lyso Lys-PG, lyso lysyl-phosphatidylglycerol; Lys-Glc2-DAG, lysyl-diglucosyl-diacylglycerol.

All Enterococcus sequences analyzed in the course of this study.

Sequence Locus IDs for all sequences listed in Table 2.

Bold denotes confirmed mprF allele.

Fisher’s exact test for determining whether positive values of Weight 2 are predictive of GlcN-DAG specificity. p=0.028.

E.coli and plasmids used

Primers used in this study.

Red indicates sequence complementarity to pABG5.

mprF sequences synthesized by Genewiz.

Red indicates sequence complementarity pABG5.