Introduction

Protein conjugations are used to fuse fluorophores to proteins and to immobilize proteins on beads, microplates, electron microscopy grids, or surface plasmon resonance chips. They enable the generation of antibody conjugates, of segmentally labeled proteins for nuclear magnetic resonance measurements, and the fusion of proteins to the cell surface. Various methods are employed for these applications, often with an excess of one conjugation partner (e.g., 3x -10x). Each of them comes with its own advantages and disadvantages. N-Hydroxysuccinimide (NHS) labeling of lysine residues or maleimide labeling of cysteine residues, are easy to use, but unspecific1. Click chemistry is specific, but requires the introduction of non-biological chemical groups in the fusion partners2,3. Split domain methods, such as SpyTag-SpyCatcher, are also specific, but introduce long sequences between the fusion partners4,5. Split intein methods introduce only a short “ligation scar”, but may suffer from solubility problems and varying efficiency for different substrate pairs6,7. Enzymatic methods are simple and, in some cases, specific, but the resulting fusions are reversible and therefore incomplete8,9. Nonetheless, they find good use, both in industry and in thousands of academic studies.

The most popular enzyme, Staphylococcus aureus Sortase A, catalyzes the reversible fusion of substrates A and B in the form of A-LPXTG (X = any amino acid) and G-B to yield A-LPXTG-B (usually with additional linker sequences) 9,10. Therefore, it shows good specificity for substrate A, but is relatively unspecific towards substrate B. Here, the glycine may be replaced by lysine side chains or by other amines11. Sortase A has a low affinity for its substrates and therefore requires high substrate concentrations for moderate activity (KM(LPETG) = 7330 μM; KM(GGGGG) = 196 μM)12. It also catalyzes the irreversible hydrolysis of both educts and products (kcat, hydrolysis = 0.086/s; kcat, Ligation = 0.28/s)12. Some of these characteristics could be improved by the generation of mutant proteins or by the optimization of reaction conditions13,14. Nevertheless, these properties still constrain its applicability, meaning that Sortase A is primarily used for protein-peptide fusions at high concentrations, and with an excess of peptide.

Several other enzymes share these basic characteristics with Sortase A. These enzymes belong to the cysteine (Sortase, Butelase, Asparaginyl endopeptidase) or serine proteases (Trypsiligase, Subtiligase), and bind their substrates through a (thio-)ester bond, which can react with H2O (hydrolysis, irreversible) or the H2N group of fusion partners (aminolysis, reversible)8,9. Recently, however, an enzyme ligase with entirely different characteristics was discovered15. This enzyme, Connectase, binds its substrate through an amide bond, which is hydrolysis-resistant. It therefore exclusively catalyzes conjugations with the H2N groups of fusion partners. In addition, Connectase acts on a longer recognition sequence, leading to higher substrate specificity and higher catalytic efficiency. This enables entirely different applications, such as the specific labeling of proteins with fluorophores within cell extracts, allowing their in-gel detection with significantly higher sensitivity and signal-to-noise ratio compared to Western blots16.

Yet just like other enzyme ligases, Connectase catalyzes a reversible reaction15. Connectase from M. mazei, for example, binds substrates with the sequence A-ELASKDPGAFDADPLVVEI (Figure 1, step 1). It then forms a covalent intermediate, A-ELASKD-Connectase, with the N-terminal part of this sequence, and cleaves off the C-terminal peptide PGAFDADPLVVEI (Figure 1, steps 2-3). This reaction works both ways, meaning that PGAFDADPLVVEI can react with A-ELASKD-Connectase to restore Connectase and its substrate, A-ELASKDPGAFDADPLVVEI. However, when a second substrate B in the form of PGAFDADPLVVEI-B is added to the reaction, it can be used instead of the peptide PGAFDADPLVVEI to form the fusion product A-ELASKDPGAFDADPLVVEI-B (Figure 1, steps 4-5).

The Connectase reaction mechanism.

The structures were predicted with Alphafold 2. For simplicity, they are shown mirror-inverted, so that the Connectase binding channel is visible, and the recognition sequence can be read from left (N-terminus) to right (C-terminus). A and B symbolize peptides or proteins.

When using equimolar quantities of educts A and B, Connectase catalyzes an equilibrium of approximately 50% fusion product A-B and 50% educts. This is because the same amounts of PGAFDADPLVVEI peptide byproduct and PGAFDADPLVVEI-B educt compete for the A-ELASKD-Connectase intermediate (Figure 1, step 3). Consequently, 100% fusion product can be obtained by removing or inactivating the peptide byproduct, for example by specific enzymatic proteolysis. However, the employed protease would have to act only on the PGAFDADPLVVEI peptide and not on the PGAFDADPLVVEI-B educt. In the following paper, we present a solution to this problem and describe a simple method to obtain 100% fusion product from equally concentrated educts in short time and with small amounts of enzyme.

Results

Modification of the Connectase recognition sequence

To facilitate the removal of peptide byproduct from Connectase reactions (see introduction), we studied ways to alter the Connectase recognition sequence. For this, we used Ubiquitin (Ub) with a C-terminal Connectase recognition sequence followed by a Streptavidin tag (Figure 2, Ub-Strep). As a second reaction substrate, we used peptides derived from the Connectase recognition sequence (XGAFDADPLVVEI, where X represents any of the 20 amino acids). The conjugation of these substrates results in a shorter Ubiquitin product (Figure 2, Ub-Peptide), which lacks the Streptavidin tag. Therefore, the conjugation rate with the different peptides could be determined by monitoring Ub-Peptide formation in SDS-PAGE time course analyses (Figure 2).

Mutagenic analysis of the Connectase recognition sequence.

A ubiquitin substrate with a C-terminal Connectase recognition sequence followed by a Streptavidin-tag (Ub-Strep) is fused to peptides consisting of N-terminal Connectase recognition sequence variants (top panel). These peptides differ in the first amino acid, where proline was replaced by each of the 19 other standard proteinogenic amino acids. An SDS-PAGE time course analysis of each reaction (1 eq. Ub-Strep, 1 eq. Peptide, 0.01 eq. Connectase, 22°C) shows the gradual emergence of the fusion product, Ub-Peptide. Based on densitometric analyses, the reaction rate with the different substrates was estimated (lower panel; an exact determination is not possible (see method section)). The peptide substrates (XGADADPLVVEI) and byproducts (PGAFDADPLVVEI-Strep; top panel) are not visible on the gels.

The experiment with the original recognition sequence peptide (X = Proline) resulted in the rapid formation of an equilibrium with ∼0.5 equivalents (eq.) Ub-Strep and ∼0.5 eq. Ub-Peptide, corresponding to ∼50% product yield from equally abundant educts (1 eq. each; see introduction). While proline substitutions with X = D, E, F, G, H, I, K, L, M, N, R, T, W, or Y drastically reduced ligation rates, substitutions with S, C, and A resulted in moderate or high ligation rates. This was surprising, because the KDPGA sequence is highly conserved in the physiological Connectase target, mtrA (methyltransferase A)15. Due to its unique structure and chemistry, proline was previously considered essential for the reaction. The results show that other amino acids, which possess a β-carbon (present in all amino acids except G) but lack a γ-carbon (absent in G, S, C, A), can also be used in this position.

Discrimination between different recognition sequences

This unexpected finding allowed us to design reactions, in which educts and peptide byproducts differ in their N-terminal amino acid. This discrimination criterion could then be used to specifically inactivate the X1GAFDADPLVVEI peptide byproduct (PGAFDADPLVVEI-Strep in Figure 2) without affecting the X2GAFDADPLVVEI-B educt (peptide without B in Figure 2). We hypothesized that for example an N-acetyltransferase, which acetylates the N-terminal alanine on the peptide byproduct (X1 = A), while leaving the N-terminal proline on educts (X2 = P) unmodified, might be employed to specifically inactivate the undesired byproduct17. Another possibility is to use chemicals, which form ring structures with the amino and sulfhydryl-groups of N-terminal cysteines (X1 = C, X2 = A / P)18. Finally, it is possible to use aminopeptidases, which act exclusively on the peptide byproduct X1. Many of these enzymes have no absolute specificity for just one amino acid 19. Proline residues, however, are structurally distinct and not modified by many promiscuous enzymes, but instead by a set of proline-specific enzymes20.

Based on these considerations, we decided to search for a proline aminopeptidase21, which removes the N-terminal proline from PGAFDADPLVVEI sequences with suitable efficiency, but is inactive towards all other residues (including X2 = A). Literature research identified Bacillus coagulans proline aminopeptidase (BcPAP) as a candidate. This enzyme had been shown to cleave N-terminal proline from peptides consisting of 2 -4 amino acids, while remaining inactive towards other N-terminal amino acids22,23. We could produce it as a soluble monomer (33 kDa) in E. coli (>40 mg from 1 L culture) and tested its suitability for shifting the Connectase reaction equilibrium.

A method for complete protein-protein fusions

We tested the effect of BcPAP in ligation reactions with A-ELASKDPGAFDADPLVVEI (A = LysS (Lysine-tRNA ligase), GST (Glutathione-S-Transferase), Ub (Ubiquitin)) and AGAFDADPLVVEI-B (B = MBP (Maltose Binding Protein), Ub, -(just the peptide)) substrates. The reactions were performed at room temperature (22°C) in a neutral buffer (pH 7.0), with moderate salt concentrations (150 mM NaCl, 50 mM KCl), 100 μM of each substrate (A and B), as well as 0.033 eq. Connectase, and 0.066 eq. BcPAP. They were separated by SDS-PAGE, stained with Coomassie G-250, and imaged with a fluorescence scanner (Excitation 685 nm, Emission 725 nm). This allowed the densitometric quantification of the resulting protein bands with good accuracy.

In each case (Figure 3A-C, Figure S1A), we observed 98 -100 % conversion of the less abundant substrate without any reaction side products. In protein-protein ligations (Figure 3A-B, Figure S1A), ∼90% fusion product was obtained after one hour of incubation, and ∼95% after two hours. Protein-peptide ligations (Figure 3C) proceeded even faster. The reaction rate was approximately four times slower at low substrate concentrations (10 μM (Figure 3D) instead of 100 μM (Figure 3A)) and about eight times slower at low temperatures (10°C (Figure 3E) instead of 22°C (Figure 3A)), but still resulted in complete protein-protein fusions.

Complete protein-protein ligations.

Shown are SDS-PAGE time course analyses of ligation reactions using Lysine-tRNA ligase (LysS) and Maltose binding protein (MBP; A, D, E, F), Glutathione-S-Transferase (GST) and MBP (B), or Ubiquitin with Streptavidin tag (Ub-Strep) and AGAFDADPLVVEI peptide (C) as substrates. Each reaction was performed with 1 eq. N-terminal fusion partner (LysS, GST or Ubiquitin; C-terminal ELASKDPGAFDADPLVVEI sequence), 1 eq. C-terminal fusion partner (MBP or peptide; N-terminal AGAFDADPLVVEI sequence), 0.033 eq. Connectase, and 0.066 eq. BcPAP. The substrate concentration was 100 μM (except for D: 10 μM) and the incubation temperature was 22°C (except for E: 10°C). In experiment F, an MBP protein with an additional N-terminal TEV protease recognition sequence (MENLYFQ|AGAFDADPLVVEI-MBP) was used and TEV protease (0.01 eq.) was added to the reaction. A densitometric analysis of the protein bands is shown below each experiment. For the substrates, the values reflect the substrate band density relative to the substrate band in the control sample (0 min); for the products, the values reflect the product band density relative to the total band density (substrates + products). The exact values can be found in Dataset S2.

We generated the AGAFDADPLVVEI-B substrates for these experiments by TEV protease cleavage of MENLYFQ|AGAFDADPLVVEI-B precursors. We chose this approach to avoid the potential acetylation of N-terminal alanine residues during expression of (M)AGAFDADPLVVEI-B substrates (methionine removal by methionine aminopeptidase). The cleavage is efficient as the TEV protease recognition sequence is exposed N-terminally of the unstructured Connectase recognition sequence. Consequently, cleavage and conjugation can be performed in parallel with small amounts of enzyme (Figure 3F). Another way to prevent N-terminal acetylation is to add an extra N-terminal proline (i.e., P|AGAFDADPLVVEI-B), which is removed during the reaction by BcPAP (Figure S1B).

To substantiate these findings, we conducted a liquid chromatography mass spectrometry (LC-MS) analysis. The signal intensities in these experiments allow no protein quantifications, as smaller molecules are often measured with more intense signals. Nevertheless, we report relative intensities in this paper as a qualitative measure. For a 1:1 Ub-MBP conjugation (Figure S2), they amounted to 0.25% (Ub), 0.13% (MBP), and 99.6% (Ub-MBP). The N-terminal proline was almost completely (99.8%) removed from the peptide byproduct, (P)GAFDAPLVVEI. Similar to all other LC-MS tested molecules in this study (i.e. Connectase, BcPAP, MBP, LysS, αHER2 LC/HC, Ubiquitin; Figures 4 and 5), it was not further processed on the N-terminus. This supports the finding19,22,23 that BcPAP acts exclusively on N-terminal proline residues.

Antibody conjugation.

Shown are SDS-PAGE time course (A, B) and LC-MS analyses (C) of αHER2 (human epidermal growth factor receptor 2) antibody conjugations. The αHER2 heavy (HC) and light chains (LC) were produced with a C-terminal Connectase recognition sequence and a Streptavidin tag (HC-Strep, LC-Strep). In the reactions (25 μM αHER2 (100 μM subunits), 0.033 eq. Connectase, 0.066 eq. BcPAP, 22°C), the Streptavidin tag is replaced by Ubiquitin (A; 1 eq.) or a shorter peptide (B; 1 eq.). A densitometric quantification of the product bands relative to the educt bands is shown below the gels. For the calculation, the BcPAP density in the control lane (Cntrl) was subtracted from the combined LC-Strep/BcPAP band. The LC-MS analyses (C) show the assemblies in the unconjugated antibody sample (top panel) and a shift of the detected masses, consistent with a near-complete conjugation to peptide (middle panel) or ubiquitin (lower panel). All detected masses, their abundances, and assignments can be found in the Dataset S2.

Protein cyclization.

Shown are SDS-PAGE time course analyses of a Ubiquitin cyclization reaction. The employed Ubiquitin substrate was produced with both an N-terminal (AGAFADPLVVEI) and a C-terminal (ELASKDPGAFDADPLVVEI) Connectase recognition sequence. This allows the formation of linear (L1 -L4, formed by 1 -4 Ubiquitin proteins) polymers, which are observed in the early stages of the time course. The N-terminus of a given polymer can be fused to its C-terminus, resulting in cyclic assemblies (C2 -C6, formed by 2 -6 Ubiquitin proteins), which present the end product of the reaction. A lower substrate concentration (A, 10 μM) results in smaller assemblies, and a higher substrate concentration results in larger assemblies (B, 100 μM). The assignment of the gel bands is consistent with LC-MS data (below the gels). The plots were normalized to the most intense Ubiquitin signal (Ub2); the BcPAP peaks are more intense (>100%), despite its relatively low abundance. The molecular masses of the ubiquitin assemblies are 13 kDa (L1), 24 kDa (L2), 34 kDa (L3), 45 kDa (L4), 21 kDa (C2), 32 kDa (C3), 43 kDa (C4), 53 kDa (C5), and 64 kDa (C6).

Finally, we tested the effect of serine-, cysteine-, or metalloprotease inhibitors on the reaction. Connectase is not a protease15 and therefore unaffected by these substances (Figure S3). However, the equilibrium shift associated with BcPAP activity could be suppressed with the serine protease inhibitors PMSF and AEBSF (Figure S3). These results contrast with previous studies, where BcPAP was found to be more susceptible to cysteine protease inhibitors23. They are, however, consistent with the classification of BcPAP as a serine protease24.

Homogeneous antibody conjugates

Antibodies are the most relevant protein conjugation target25. Many applications require their conjugation to spacious molecules (e.g., horseradish peroxidase) and/or to a defined number of molecules. This number, also known as drug : antibody ratio, is used as a benchmark for several specialized techniques. Of these, formyl-glycine insertion26 (Catalent), sugar engineering27 (Mersana), cysteine engineering28 (Genentech), the introduction of unnatural amino acids29 (Sutro), and Sortase-mediated conjugations30 (NBE) all find commercial use. Each of these approaches has its advantages and disadvantages. One problem is that even a relatively high conjugation ratio of 90% on each antibody chain leads to a desired drug : antibody ratio of 4 in only 66% of all HC2LC2 antibodies, and these are hard to separate from the remaining 34%. Side reactions can further complicate the process.

To test the established method for antibody conjugation, we added the Connectase recognition sequence plus an additional C-terminal Streptavidin tag to the C-termini of αHER2 light and heavy chains. HEK293 cells were transiently transfected with plasmids encoding for these proteins, and the culture medium was tested for the exported antibodies with the in-gel fluorescence method16,31. In this western blot alternative, Connectase is used to fuse far-red fluorophores to target proteins, which can subsequently be detected on the SDS-gel with a fluorescence imager (Figure S4). From this, we expected a concentration of ∼2 μg antibody per ml of culture medium – a value that could be confirmed after protein purification with a Streptavidin column.

The purified antibodies were labeled with either AGAFDADPLVVEI peptide or AGAFDADPLVVEI-Ubiquitin (1 eq. per antibody subunit; reaction conditions as in Figure 3). SDS-PAGE time course analyses show almost complete labelling reactions after an incubation time of two hours (Figure 4A-B). The heavy chain is labelled faster than the light chain, possibly because the Connectase recognition sequence is more accessible in this case.

The reactions were also analyzed by LC-MS. For this, the antibody carbohydrate side chains were removed enzymatically to reduce sample complexity32. The masses determined in the unconjugated antibody sample were consistent with assemblies of completely intact antibody subunits, except for the C-terminal Strep-tag lysine, which was missing in most subunits. Of the assigned masses, HC2LC2 assemblies constituted 69% of the signal, followed by HC2 (30%), and HC2LC1 (0.4%) assemblies (Figure 4C, top panel).

Next, a 1:1 antibody-peptide conjugation reaction was analyzed with the same method (Figure 4C, middle panel). Here, the HC2LC1 assembly was detected only in completely conjugated form, as HC2LC1-peptide3 (100%). Similarly, HC2 was completely conjugated to HC2-peptide2 (100%). HC2LC2 was found to be conjugated to either two (2%), three (15%), or four (83%) peptides. Taken together, this suggests a total labeling efficiency of 97%, in line with the results shown in Figure 4B.

Finally, a 1:1 antibody-ubiquitin conjugation was analyzed (Figure 4C, lower panel). Here, the HC2LC1 assembly was more abundant and the HC2 assembly less abundant compared to the unconjugated sample. Again, HC2 and HC2LC1 were detected only in completely conjugated form, as HC2-Ub2 (100%) and HC2LC1-Ub3 (100%). HC2LC2 was found to be conjugated to either one (2%), two (1%), three (2%), or four (95%) ubiquitin molecules, suggesting a total labeling efficiency of 98%.

These results demonstrate that the method is effective for the rapid generation of near-homogeneous antibody conjugates. The conjugation ratio could potentially be further increased with a small excess of the conjugation partner (here: Ubiquitin or peptide). The desired products can be purified in subsequent steps, for example with Streptavidin columns to remove unconjugated antibodies, or with size exclusion columns, to separate HC2LC2 from other assemblies.

Protein cyclization and polymerization

Protein cyclization is used to enhance protein stability33, while protein polymerization can be useful for engineering biomaterials or affine binders34. Both results may be achieved with proteins carrying both N- and C-terminal Connectase recognition sequences. Circularization may be favored, when the protein is present at a low concentration and when its N- and C-termini are in close proximity. Polymerization might be favored for rod-shaped proteins with distant N- and C-termini at high concentrations.

To test these ideas, we used Ubiquitin (AGAFDADPLVVEI-Ub-ELASKDPGAFDADPLVVEI) as a small globular test substrate at low concentrations (10 μM, Figure 5A). At the start of the reaction (0 – 7.5 min), we observed its polymerization to linear chains of two (Ub-L2) or three (Ub-L3) protomers. These linear assemblies gradually disappeared (7.5 – 120 min), as their N- and C-termini were fused. The resulting circular assemblies (80% Ub2 and 20% Ub3, according to SDS-PAGE densitometry) presented the stable end-products of the reaction.

When the reaction was performed with ten-fold higher substrate concentrations (100 μM, Figure 5B), longer linear ubiquitin polymers were initially formed, which eventually gave rise to larger circular assemblies (37% Ub2, 45% Ub3, 9% Ub4, 6% Ub5, and 2% Ub6). Single ubiquitin molecules were not efficiently circularized in either experiment, suggesting that the N- and C-termini in one ubiquitin molecule are too far apart to be connected, and that additional linker sequences are needed to favor this assembly.

Taken together, the results show that the presented conjugation method enables efficient protein circularizations and that the yields of specific assemblies can be controlled by modifying the reaction conditions (e.g., substrate concentration) and the substrate design (e.g., linker length). As with other cyclization/polymerization methods35, these parameters need to be tested for each individual substrate.

Discussion

In this paper, we have shown that the Connectase recognition sequence can be altered, so that the peptide reaction byproduct can be specifically inactivated by a proline aminopeptidase. This method enables the fast, simple, specific, and complete 1:1 fusion of proteins and/or peptides.

Instead of just fusing two molecules, as for example in chemical ligations, the method replaces one sequence (e.g. PGAFDADPLVVEI-Strep in Figure 4) with the fusion partner. This has several advantages. For example, an affinity tag used for substrate purification can be removed by the conjugation reaction. As the affinity tag remains on unconjugated substrates, it can subsequently be used to remove the peptide byproduct and any unconjugated substrates (e.g., an antibody conjugated to only three molecules instead of four). When the same affinity tag is added to Connectase and BcPAP, this setup enables the complete purification of homogeneous fusion product in a single step.

Another consequence of this replacement mechanism is that complete fusions may be reversed on demand. This is not possible with chemical, split intein, or split domain ligation methods. Yet, the addition of a new X3GAFDADPLVVEI-C fusion partner to an existing A-ELASKDX2GAFDADPLVVEI-B fusion product allows the exchange of molecules B and C. To make this second conjugation complete, a new method for the specific removal of X2 is needed, just like BcPAP was used to remove X1 = Pro in this study. This new method could involve another aminopeptidase19, a specific chemical modification18, or an N-acetyltransferase17 (see the section “Discrimination between different recognition sequences”). The resulting re-conjugation would enable advanced applications, such as the immobilization of a protein on a surface and its release on demand, or the visualization of existing conjugation products by fluorophore (de-)coupling.

These considerations illustrate the versatility of Connectase as a bioengineering tool. As more applications with this enzyme are developed, synergies will emerge. For example, in this paper, the same Connectase recognition sequence could be used to monitor antibody expression levels in a western blot-like application (Figure S4) and for the subsequent antibody conjugation. In future, other Connectase applications, such as protein purification or microplate-based protein quantification, may offer additional functionality for the recognition sequence.

During the preparation of this study, a somewhat related method was published for the enzyme Sortase A36. The authors fused various sequences, such as A-LPETGAHHHHH and GVGSKYG for A-LPETGVGSKYG generation (C-terminal labeling) or YALPETGG and GVGK-B for YALPETGVGK-B generation (N-terminal labeling). The peptide byproducts (i.e., GAHHHHHH or GG), were digested with an aminopeptidase. This aminopeptidase showed a preference for GG or GA over GV and therefore displayed a lower activity towards GVGSKYG or GVGK-B substrates. The method significantly increased the obtained fusion product yields and therefore presents an important development for Sortase-mediated ligations. Despite these similarities, there are important differences compared to the method presented here. As described in the introduction, Sortase irreversibly hydrolyzes both educts and products, shows low specificity for the C-terminal fusion partner, and has low catalytic efficiency8,9. As a result, the method is most efficient for C-terminal protein conjugations with small peptides at high concentrations36. Protein-protein conjugations are relatively inefficient, as they result in high quantities of hydrolysis side products and remain largely incomplete, even after several days of incubation with high enzyme quantities.

The main disadvantage of Connectase lies in its relatively long recognition sequence. The substrates require a 13 amino acid recognition sequence on the N-terminus and a 19 amino acid sequence on the C-terminus (Figure S5). For protein-protein conjugations, we employed additional linker sequences of one (N-terminal R) and five (C-terminal AAAGA) amino acids. This design resulted in efficient conjugations for each substrate tested so far, including 44 constructs derived from 19 different proteins15,16,31. Therefore, the employed linkers can generally be regarded as sufficient, even in cases of poor steric accessibility of the substrate termini. Taken together, this results in a final conjugation “scar” of 19 -25 amino acids between the fusion partners. This is comparable to the SpyTag / KTag system (23 amino acids plus linker sequences)34, but longer than the sequences typically employed for Sortase-mediated fusions (7 -9 amino acids, usually with additional (GGGGS)x linker between the protein(s) and the ligation site37). A shorter Connectase recognition sequence might be engineered in the future. For now, the method described here is primarily useful in applications, where this longer “scar” can be tolerated. In these cases, however, the combination of simplicity, substrate specificity, efficiency, and completeness, along with the possibility to use either biologically or chemically produced substrates, and the absence of side reactions, is unmatched by other chemical or enzymatic methods.

Methods

Cloning, Expression, and Purification

The sequences of all proteins and peptides used in this study are listed in Dataset S1. The peptide for the in-gel fluorescence assay was synthesized by Intavis, while all other peptides were synthesized by Genecust. Genes were synthesized by Biocat, and cloned into the pET30b(+) vector (restriction sites: NdeI, XhoI) for expression in E. coli or the pcDNA3.1 vector (restriction sites: HindIII, XhoI) for expression in HEK293 cells.

For recombinant expression in E. coli, BL21 gold cells were transfected with the respective plasmids and grown in lysogeny broth medium with 50 μg/l kanamycin at 22°C. Protein expression was induced at an optical density of 0.4 at 600 nm with 500 μM isopropyl-β-D-thiogalactoside. Cells expressing soluble proteins were harvested after 16 h, resuspended in buffer (100 mM Tris-HCl, 1x c0mplete EDTA-free protease inhibitor cocktail (Roche; no inhibitor was added to cells expressing BcPAP), 0.02 g/l DNAse, pH 8.0), lysed by French press, and cleared from cell debris by ultracentrifugation (120000 g, 45 min, 4°C).

For recombinant expression of αHER2 antibodies, HEK293 cells were cultured at 37°C in ten 75 cm2 flasks with Dulbecco’s Modified Eagle Medium (DMEM) supplemented with fetal calf serum. At 70% confluency, they were transfected with plasmids encoding for αHER2 light and heavy chains using Lipofectamine 2000 (Thermo), according to the manufacturer’s instructions (47 μl Lipofectamine, 55 μg of each plasmid). The cells were grown for ten days without splitting, and the medium was exchanged daily. The medium samples were used for αHER2 detection by in-gel fluorescence (Figure S4, described below), then pooled, centrifuged (6000 g, 10 min, 4°C), and filtered (0.45 μm) before protein purification.

For protein purification, His6-tagged proteins (all proteins except for antibody subunits (Figure 4 and S4) and the Ubiquitin construct for cyclization (Figure 5)) were applied to HisTrap HP columns (20 mM Tris-HCl pH 8.0, 250 mM NaCl, 20 -250 mM imidazole). Strep-tagged proteins (the antibody subunits and the Ubiquitin construct for cyclization) were instead purified with StrepTrap XT columns (1.8 mM KH2PO4, 10 mM Na2HPO4, 2.7 mM KCl, 138 mM NaCl, 0 -50 mM Biotin, pH 7.4). After this initial purification step, proteins with N-terminal TEV recognition sequences (MBP (Figures 3 and S2), Ubiquitin (Figures S1 and 4), and Ubiquitin for cyclization (Figure 5)) were incubated with TEV protease at a 1:100 molar ratio. The reaction was performed overnight in dialysis tubes (dialysis buffer: 20 mM Tris-HCl pH 8.0, 250 mM NaCl) at 4°C. The processed proteins were separated from His6-tagged TEV protease, N-terminal fragments (MHHHHHHENLYFQ), and residual unprocessed proteins by another purification step. For this, the reactions were applied a second time to HisTrap HP columns (as above), and the flow-through was collected. All chromatography steps were performed on an Äkta Purifier FPLC (GE Healthcare) using Unicorn v5.1.0 software. Purified proteins were supplemented with 15% glycerol, flash-frozen in liquid nitrogen, and stored at -80°C.

Biochemical Assays

Unless noted otherwise, all conjugation reactions were performed at 22°C in neutral (pH 7.0) buffer containing 50 mM sodium acetate, 50 mM MES, 50 mM HEPES, 150 mM NaCl, and 50 mM KCl. They were stopped with SDS loading buffer (final concentration: 50 mM Tris-HCl, 2% SDS, 10% glycerol, 25 mM β-mercaptoethanol, 0.01% bromophenol blue, pH 6.8; final protein concentration ∼0.1 g/l) and incubated at 90°C for 10 minutes. The samples were separated using mPAGE 12% Bis-Tris gels (Merck; 5 μl loading volume per sample) with MOPS running buffer (50 mM MOPS, 50 mM Tris, 0.1% SDS, 1 mM EDTA; no pH adjustment). The gels were stained with Coomassie blue (25% ethanol, 25% 25% methanol, 10% acetate, 0.25% Coomassie R-250), and subsequently with Coomassie colloidal solution (20% ethanol, 10% ammonium sulfate, 5.8% phosphoric acid, 5% methanol, 0.12% Coomassie G-250). They were destained with 10% acetic acid and imaged with an Azure Sapphire NIR fluorescence scanner (excitation at 685 nm, emission at 725 nm, 25 -50 μm resolution, Intensity 8, highest scanning speed). Densitometric band quantification was performed with Image Studio Lite 5.2. All obtained values and all unprocessed gels can be found in Dataset S2.

For Figure 2, 20 experiments were set up, each with a different XGAFDADPLVVEI peptide (X = any of the 20 amino acids). The reactions contained 20 mM Ub-Strep, 20 mM peptide, and 0.2 mM Connectase. Samples were taken before the addition of Connectase (0 min) and after the indicated times (0.1 -96 h). After SDS-PAGE and densitometric analyses, the relative reaction rates in each experiment were estimated. An exact determination was not possible because the reaction is reversible, and the emerging peptide side product (PGAFDADPLVVEI-Strep) competes with the assayed peptide (XGAFDADPLVVEI) for the enzyme binding sites. The estimates were made based on the required time to obtain 10%, 20%, and 30% fusion product yield in each reaction.

For Figure 3 and Figure S1, 8 experiments were set up with different substrate pairs (Figure 3A, D, E: LysS/MBP; 3B: GST/MBP; 3C: Ub-Strep/AGAFDADPLVVEI peptide; 3F: LysS/MBP before TEV cleavage (see “Purification”); Figure S1A: LysS/Ub; S1B: LysS/Pro-Ub; protein sequences are listed in Dataset S1). Each substrate was used at 100 μM (except for Figure 3D (10 μM)). The reactions were started by addition of 0.033 molar equivalents (eq.) Connectase and 0.066 eq. BcPAP. For the experiment shown in Figure 3F, 0.01 eq. TEV protease was also added. The reactions were incubated at 22°C (except for Figure 3E (10°C)). Samples were taken before (0 min) and after the addition of Connectase/BcPAP (3.8 - 960 min). After SDS-PAGE, the substrate band densities relative to the control sample (0 min) were determined. The product band densities were determined relative to the total band density (substrates + products).

For Figure S3, Connectase (2 μM) was used without (A) or with BcPAP (7 μM, B). Both solutions were incubated at 22°C for 30 min with the following compounds: buffer (control reaction), 1 mM ZnCl2, “Complete” protease inhibitor mix (Roche, 1 tablet per 25 ml), AEBSF, ALLN, Antipain, Aprotinin, Bestatin, Chymostatin, E-64, EDTA, Leupeptin, Pepstatin, Phosphoramidon, PMSF. The concentrations of the last compounds are unknown to us, as the supplier (G-biosciences, Protease Inhibitor Set, inhibitors used at “2x” concentration) refused to provide them on request. After the incubation, the enzyme-inhibitor mixture was added to an equal volume of conjugation substrates (20 μM Ub-Strep, 20 μM AGAFDADPLVVEI peptide). After 2 h at 22°C, the reactions were analyzed by SDS-PAGE.

For Figure S4, the cell culture medium samples taken during αHER2 expression (see above) were analyzed. They were centrifuged (1 min, 10000 g) and 2.5 μl of each supernatant was mixed with 300 fmol reference protein (MBP with a C-terminal Connectase recognition sequence). The mixture was incubated with 1 nM Connectase and 10 nM fluorescent peptide substrate (RELASKDPGAFDADPLVVEISEEGE-Cy5.5) for 20 min at 22°C. The reactions were separated by SDS-PAGE and imaged before and after Coomassie staining. The band densities corresponding to reference protein and αHER light (LC) and heavy chains (HC) were determined. They were used to estimate the expressed αHER2 quantities over time with [αHER2] = reference protein quantities x ([signal HC] + [signal LC] / [signal reference protein]). This estimation can be turned into an exact determination of αHER2 quantities by division with an experimentally determined factor, which describes the relative reactivity and brightness of antibody and reference protein bands. In this case, this factor was close to 1, so that estimated and determined values were nearly identical. The determination and the detailed method are described in 31.

For Figure 4, 25 μM Strep-tagged αHER2 antibody was mixed with 3.33 μM Connectase, 6.66 μM BcPAP, and either 100 μM AGAFDADPLVVEI-Ubiquitin or 100 μM AGAFDADPLVVEI peptide. The reactions were incubated at 22°C, and samples were taken after the indicated times (0 -120 min). After SDS-PAGE, the product band quantities (i.e., LC-Ub, HC-Ub, or LC-peptide, HC-peptide) were determined relative to the educt band quantities (LC-Strep, HC-Strep).

For Figure 5, a ubiquitin variant with an N-terminal (AGAFDADPLVVEI…) and a C-terminal (ELASKDPGAFDADPLVVEI) Connectase recognition sequence was employed. This substrate was used at a concentration of 10 μM (first experiment) and at a concentration of 100 μM (second experiment). Both reactions were conducted with 0.033 eq. Connectase and 0.066 eq. BcPAP at 22°C. Samples were taken after the indicated times and analyzed by SDS-PAGE.

For Figure S5, Ub-Strep (10 μM) was mixed with RELASKDPGAFDADPLVVEI, ELASKDPGAFDADPLVVEI, or LASKDPGAFDADPLVVEI peptides (10 μM) and 0.25 μM Connectase. Reaction samples were taken after the indicated times (0 - 60 min) and analyzed by SDS-PAGE.

Liquid Chromatography-Mass Spectrometry (LC-MS)

LC-MS analysis was performed at the Natural and Medical Sciences Institute (NMI, Reutlingen, Germany), using established sample preparation and data interpretation protocols32,38. Specifically, the samples were prepared as follows:

For Figure S2, 100 μM Ub-Strep, 100 μM MBP, 3.33 μM Connectase, and 6.66 μM BcPAP were mixed. The reaction was incubated for 4 h at 22°C and then used for LC-MS analysis.

For Figure 4, 25 μM Strep-tagged αHER2 antibody was mixed with 3.33 μM Connectase, 6.66 μM BcPAP, and either 100 μM AGAFDADPLVVEI-Ubiquitin or 100 μM AGAFDADPLVVEI peptide. The reaction was incubated for 4 h at 22°C and then used for LC-MS analysis. Unconjugated αHER2 antibody was used as a control. All samples were deglycosylated with PNGase F (R&D Systems) for 16h at 37°C.

For Figure 5, a ubiquitin variant with N- and C-terminal Connectase recognition sequence was used at two different concentrations, 10 μM and 100 μM. Both samples were incubated with 0.033 eq. Connectase and 0.066 eq. BcPAP for 4h at 22°C and then used for LC-MS analysis.

The samples stored at 4°C for up to 6 hours, before they were subjected (1.6 μg protein) to an Acquity BEH C4 column, using an UltiMate3000 UHPLC. They were eluted with a 0 -50% H2O / acetonitrile gradient in presence of 0.1% formic acid over 7 minutes. The eluted molecules were analyzed with a MaXis HD UHR q-TOF spectrometer. Mass spectrometer parameters were adapted to the size of the molecule and the chromatography flow rate (by default 0.15 ml/min). Data analysis was performed using Bruker Compass DataAnalysis v6.1 software (Bruker Daltonik, Bremen, Germany). Charge deconvolution of the m/z spectra was performed with the MaxEnt deconvolution algorithm (Bruker Daltonic, Bremen, Germany). Deconvolution artifacts without m/z series were excluded.

Data availability

All primary data are available in Dataset S2. All relevant data are also available from the corresponding author upon request.

Acknowledgements

Liquid chromatography-mass spectrometry was performed by Sandra Maier and Dr. Anne Zeck at the NMI Natural and Medical Sciences Institute at the University of Tübingen, using instrumentation acquired through the Baden-Württemberg Ministry of Economic Affairs, Labor and Tourism (Germany) grant program “Special investment program for climate-neutral business-related research” (WM3-4332-3/6). We thank Andrei Lupas for discussions and continuous support. We thank Valeria Hatskovska for support in handling the eukaryotic cell cultures. This work was supported by institutional funds from the Max Planck Society and by the German Research Foundation (DFG project number 512378754 to A.C.D.F.).

Competing Interests

Max Planck Innovation has filed a provisional patent on the method described in this paper (EP-Patent Application EP24188474.1). The author declares no other competing interests.

(related to Figure 3): Complete protein-protein ligations.

Shown are SDS-PAGE time course analyses of ligation reactions using Lysine-tRNA ligase (LysS; C-terminal ELASKDPGAFDADPLVVEI sequence) and Ubiquitin (Ub; N-terminal AGAFDADPLVVEI (A) or PAGAFDADPLVVEI (B) sequence) as substrates. Both reactions were performed with 1 eq. substrates (100 μM), 0.033 eq. Connectase, and 0.066 eq. BcPAP at 22°C. A densitometric analysis of the protein bands is shown below each experiment. For the substrates, the values reflect the substrate band density relative to the substrate band in the control sample (0 min); for the products, the values reflect the product band density relative to the total band density (substrates + products). The exact values can be found in Dataset S2.

LC-MS analysis of an equimolar Ub-MBP mixture before (A) and after (B, C) conjugation.

A mixture of 100 μM Ub-ELASKDPGAFDADPLVVEI-Strep and 100 μM AGAFDADPLVVEI-MBP was analyzed before (A) and after (B) incubation with 0.033 eq. Connectase and 0.066 eq. BcPAP. The reaction byproduct GAFDADPLVVEI-Strep (C) appeared as an extra peak upon conjugation. The signal intensities in the plots were normalized to the most intense peak. MBP was detected both as a full-length version and as an N-terminally truncated version (MBP Δ1-221) without Connectase recognition sequence. The truncated version was not detected by SDS-PAGE (Figure 3), suggesting a low abundance in the sample. All detected masses can be found in Dataset S2.

Effect of protease inhibitors on BcPAP activity.

Shown is the conjugation of Ub-Strep (educt, 1 eq.) to AGAFDADPLVVEI peptide (1 eq.) in presence of different protease inhibitors. The reaction catalyzed by Connectase (upper gel) is not inhibited by these substances and results in an equilibrium between Ub-Strep educt and Ub-Peptide product (as in Figure 2). In a reaction with Connectase and BcPAP (lower gel), up to 100% Ub-peptide product is formed. Lower product yields indicate an inhibition of BcPAP. This effect is most pronounced for the serine protease inhibitors AEBSF, PMSF, and a commercial AEBSF-containing inhibitor mix (“complete”). ZnCl2, which had been reported previously as a BcPAP inhibitor23, led to the precipitation of Connectase and BcPAP.

Quantification of αHER2 antibodies in cell culture medium.

Heavy (HC) and light(LC) antibody chains with a C-terminal Connectase recognition sequence were expressed in HEK293 cells. The medium with the exported antibodies was exchanged daily, allowing the monitoring of antibody expression levels by in-gel fluorescence (A). In this western blot alternative, Connectase is used to fuse fluorophores to the target proteins (HC, LC) and a reference protein (Ref). By comparing the intensity of the resulting fluorescent bands, daily antibody expression levels can be estimated (lower panel, see methods). A Coomassie stain of the same gel (B) shows all proteins in the cell culture medium samples. The experiment is described and discussed in detail in 31. It is also depicted here because it shows the production of antibodies used in Figure 4 and highlights the use of the Connectase recognition sequence on a protein of interest for different applications: protein detection and quantification (this figure), and protein conjugation (Figure 4). This figure is reproduced from 31. It was created by the author. Compared to the original image, the signal ratio text line was removed.

Determination of the minimal Connectase recognition sequence.

Connectase acts on a linker sequence derived from its physiological interaction partner, Methyltransferase A (MtrA). In an initial characterization15, this sequence was identified as RELASKDPGAFDADPLVVEI. It remained unclear, whether it could be further shortened from the N-terminal side. The depicted gel shows the Connectase-mediated conjugation of Ub-Strep to RELASKDPGAFDADPLVVEI (peptide 1), ELASKDPGFDADPLVVEI (peptide 2), or LASKDPGAFDADPLVVEI (peptide 3). The product, Ub-peptide, is formed at a similar rate with peptide 1 (relative ligation rate determined by densitometric analysis: 94%) and peptide 2 (100%), but at a reduced rate when using peptide 3 (47%). This suggests that ELASKDPGAFDADPLVVEI is sufficient for efficient conjugation reactions. The protein substrates employed in this paper have the C-terminal RELASKDPGAFDADPLVVEI sequence, with the additional N-terminal arginine serving as a small linker.