PPI-hotspotID: A Method for Detecting Protein-Protein Interaction Hot Spots from the Free Protein Structure

Yao Chi Chen; Karen Sargsyan; Jon D Wright; Yu-Hsien Chen; Yi-Shuian Huang; Carmay Lim

doi:10.7554/eLife.96643.2

eLife assessment

The manuscript presents a machine-learning method to predict protein hotspot residues. The validation is incomplete, along with the misinterpretation of the results with other current methods like FTMap.

https://doi.org/10.7554/eLife.96643.2.sa2

Strength of evidence

incomplete: Main claims are only partially supported

exceptional
compelling
convincing
solid
incomplete
inadequate

During the peer-review process the editor and reviewers write an eLife assessment that summarises the significance of the findings reported in the article (on a scale ranging from landmark to useful) and the strength of the evidence (on a scale ranging from exceptional to inadequate). Learn more about eLife assessments

Abstract

Experimental detection of residues critical for protein-protein interactions (PPI) is a timeconsuming, costly, and labor-intensive process. Hence, high-throughput PPI-hot spot prediction methods have been developed, but they have been validated using relatively small datasets, which may compromise their predictive reliability. Here, we introduce PPI-hotspot^ID, a novel method for identifying PPI-hot spots using the free protein structure, and validated it on the largest collection of experimentally confirmed PPI-hot spots to date. We explored the possibility of detecting PPI-hot spots using (i) FTMap in the PPI mode, which identifies hot spots on protein-protein interfaces from the free protein structure, and (ii) the interface residues predicted by AlphaFold-Multimer. PPI-hotspot^ID yielded better performance than FTMap and SPOTONE, a webserver for predicting PPI-hotspots given the protein sequence. When combined with the AlphaFold-Multimer-predicted interface residues, PPI-Hotspot^ID, also yielded better performance than either method alone. Furthermore, we experimentally verified several PPI-hot spots of eukaryotic elongation factor 2 predicted by PPI-hotspot^ID. Notably, PPI-hotspot^ID unveils PPI-hot spots that are not obvious from complex structures, which only reveal interface residues, thus overlooking PPI-hot spots in indirect contact with binding partners. Thus, PPI-hotspot^ID serves as a valuable tool for understanding the mechanisms of PPIs and facilitating the design of novel drugs targeting these interactions. A freely accessible web server is available at https://ppihotspotid.limlab.dnsalias.org/ and the source code for PPI-hotspot^ID at https://github.com/wrigjz/ppihotspotid/.

Introduction

Protein-protein interactions (PPIs) play a crucial role in cellular physiology, and their dysregulation is associated with various diseases¹ such as cancer,² infectious diseases, and neurodegenerative diseases.³ Identifying residues critical for PPIs (termed PPI-hot spots) is important for elucidating protein function and designing targeted biomedical interventions.^4,5 Conventionally, PPI-hot spots are defined as residues whose mutations to alanine cause ≥ 2 kcal/mol drop in the protein binding free energy.^6–11 However, this definition, based on measuring the binding free energy change upon mutation to alanine, limits the number of experimentally determined PPI-hot spots. Hence, PPI-hot spots have been more broadly defined to include residues whose mutations, not necessarily to alanine, significantly impair/disrupt PPIs,^12,13 as detected by experimental methods such as coimmunoprecipitation and yeast two-hybrid screening. As each mutant must be purified and analyzed separately, experimental detection of PPI-hot spots is time-consuming, costly, and labor-intensive.

To enable large-scale detection of PPI-hot spots, high-throughput PPI-hot spot prediction methods have been developed. They generally fall into 2 categories:¹⁴ (1) Methods that compute the binding energy/free energy difference between the wild-type protein and a mutant using classical force fields or empirical scoring functions.^11,15–23 (2) Methods that employ classifiers such as nearest neighbor, support vector machines, decision trees, Bayesian/neural networks, random forest, and ensemble machine-learning models using various features including conservation, secondary structure, solvent-accessible surface area, and atom density.¹⁴,^24–36 Most of the PPI-hot spot prediction methods rely on the protein complex structure and some are accessible via webservers; e.g., Hotpoint,³⁷ KFC2,³⁸ PredHS,³⁹ and PredHS2.⁴⁰ Fewer methods use only the free protein structure^41–45 or sequence,^34,46–54 and SPOTONE (hot SPOTs ON protein complexes with Extremely randomized trees) is available as a web server. SPOTONE⁵³ predicts PPI-hot spots from the protein sequence using residuespecific features such as atom type, amino acid (aa) properties, secondary structure propensity, and mass-associated values to train an ensemble of extremely randomized trees.

The PPI-hot spot prediction methods have mostly been trained, validated, and tested on data from the Alanine Scanning Energetics database (ASEdb)⁵⁵ and/or the Structural Kinetic and Energetic database of Mutant Protein Interactions (SKEMPI) 2.0 database.⁵⁶ However, antibody-antigen interactions have different sequence and structural characteristics compared to non-antibody PPIs.⁵⁷ Therefore, our focus is exclusively only on non-antibody proteins in this study. The ASEdb contains 96 PPI-hot spots from 26 proteins. SKEMPI 2.0, which includes single-point mutations not necessarily to alanine that decrease the protein binding free energy by ≥ 2 kcal/mol, has 343 PPI-hot spots from 117 distinct proteins, 40 of which overlap with ASEdb. Altogether, ASEdb and SKEMPI contain 399 distinct PPI-hot spots in 132 proteins. To increase this number of distinct PPI-hot spots, we have expanded the definition of PPI-hot spots to include mutations in UniProtKB⁵⁸ that have been manually curated as significantly impairing/disrupting PPIs.¹³ This expanded definition led to the creation of the PPI-Hotspot^DB database, which contains 4,039 experimentally determined PPI-hot spots in 1,893 proteins. To calibrate PPI-hot spot prediction methods using the free protein structure, a benchmark was derived from PPI-Hotspot^DB.¹³ This benchmark called PPI-Hotspot+PDB^BM contains nonredundant proteins with free structures and known PPI-hot spots. The proteins in PPI-Hotspot+PDB^BM share < 60% sequence identity, which has been shown to be a reasonable threshold for grouping domains with similar functions.¹³

Our aim is to develop a method for identifying PPI-hot spots in non-antibody proteins using free protein structures. First, we updated the PPI-Hotspot+PDB^BM benchmark and constructed a dataset comprising 158 nonredundant proteins with free structures harboring 414 experimentally known PPI-hot spots and 504 PPI-nonhot spots (see Methods). Using this dataset, we applied an automatic machine-learning framework that automates the machinelearning pipeline to detect PPI-hot spots using the aa type as well as structural, energetic, and evolutionary features of each residue in a protein. The resulting prediction model, named PPI-Hotspot^ID, identifies PPI-hot spots using an ensemble of classifiers and only 4 residue features (conservation, aa type, solvent-accessible surface area (SASA), and gas-phase energy, ΔG^gas). We explored the possibility of detecting PPI-hot spots using the FTMap server in the PPI mode, which identifies hot spots on protein—protein interfaces from free protein structures.⁴⁵ These hot spots are identified by consensus sites - regions that bind multiple probe clusters.^42,45,59 Such regions are deemed to be important for any interaction involving that region of the target, independent of partner protein.⁴² PPI-hot spots were identified as residues in van der Waals (vdW) contact with probe ligands within the largest consensus site containing the most probe clusters. We also explored the possibility of detecting PPI-hot spots using the interface residues predicted by AlphaFold-Multimer,⁶⁰ which has been shown to outperform current docking methods in predicting protein-protein complexes. Finally, we illustrated the utility of PPI-Hotspot^ID by applying it to detect PPI-hot spots of eukaryotic elongation factor 2 (eEF2), a translation factor essential for peptide elongation, and experimentally verified the predictions.

Results

Evaluating the Performance of PPI-Hot Spot Detection Methods.

The goal of PPI-hotspot^ID is to detect true PPI-hot spots rather than true PPI-nonhot spots in proteins. Hence, we assessed the performance of PPI-hotspot^ID by computing the sensitivity/recall (the fraction of true PPI-hot spots correctly identified),

the fraction of predicted PPI-hot spots that are true PPI-hot spots; i.e.,

and the F1-score, which combines recall and precision:

Since our dataset also contains true PPI-nonhot spots, we calculated the specificity (the fraction of true PPI-nonhot spots correctly identified):

In eqs 1–4, TP (true positives) or TN (true negatives) is the number of correctly predicted PPI-hot spots or PPI-nonhot spots, and FP (false positives) or FN (false negatives) is the number of wrongly predicted PPI-hot spots or PPI-nonhot spots.

Performance of PPI-hotspot^ID vs. FTMap and SPOTONE.

We compared the performance of PPI-Hotspot^ID, FTMap,⁴⁵ and SPOTONE⁵³ using a dataset containing 414 true PPI-hot spots and 504 nonhot spots (see Methods). Note that the 414 true PPI-hot spots represent only 2% of the total number of residues (21,722) in the 158 proteins. Given the free protein structure, PPI-Hotspot^ID and SPOTONE⁵³ predict PPI-hot spots based on a probability threshold (> 0.5). FTMap, in the PPI mode, detects PPI-hot spots as consensus sites/regions on the protein surface that bind multiple probe clusters.⁵⁹ Residues in vdW contact with probe molecules within the largest consensus site were compared with PPI-hotspot^ID/SPOTONE predictions. Residues not classified as PPI-hot spots by each method were considered as PPI-nonhot spots. Table 1 summarizes the results for our dataset, with the F1 score in parentheses representing the mean validation F1 score. Compared to FTMap/SPOTONE, PPI-Hotspot^ID detected a much higher fraction of true positives (0.67 vs. 0.07/0.10) and achieved a significantly higher F1 score (0.71 vs. 0.13/0.17).

Performance of the PPI-Hotspot^ID vs. FTMap and SPOTONE^a

Performance of AlphaFold-Multimer, PPI-Hotspot^ID, and their combination for 48 “unsolved” complex structures

To elucidate the differences between the PPI-hot spots predicted by PPI-Hotspot^ID and those by FTMap or SPOTONE, we compared their respective true positive predictions. The Venn diagram in Table 1 shows a substantial overlap in true positives between FTMap or SPOTONE and PPI-Hotspot^ID: FTMap shared 23/30 true positives with PPI-Hotspot^ID, whereas SPOTONE shared 34/40 with PPI-Hotspot^ID, but only 3 with FTMap. Only 3 true positives were predicted by all 3 methods. PPI-Hotspot^ID identified many true positives that were not detected by FTMap or SPOTONE probably because it employed aspects not considered by FTMap or SPOTONE such as the gas-phase energy, ΔG^gas (see Discussion). Furthermore, SPOTONE defined true negatives as residues whose mutation to alanine led to protein binding free energy changes (ΔΔG^bind) of ≤ 2.0 kcal/mol, whereas we defined true negatives as residues whose alanine/nonalanine mutations resulted in negligible protein binding free energy changes (ΔΔG^bind < 0.5 kcal/mol) or did not perturb PPIs in immunoprecipitation or GST pull-down assays (see Methods).

Interface vs. Noninterface PPI-hotspots.

We can estimate the fraction of PPI-hotspots at the protein interface for 74 of the 158 nonredundant proteins in our dataset with complex structures. These 74 proteins harboring 243 true PPI-hotspots form 78 PPI pairs. Using the UniProt codes of each protein and its binding partner, we identified all complex structures in the PDB. Based on the complex structures of each PPI pair, we classified a PPI-hotspot as an interface one if it formed hydrogen bonds or vdW contacts with the partner protein,⁶¹ otherwise it was deemed a noninterface PPI-hotspot. Among the 243 true PPI-hotspots, 67 (27.6%) lacked such contacts across the protein interface. For these 74 proteins, PPI-hotspot^ID predicted 240 PPI-hot spots, out of which 43 (18%) are noninterface PPI-hot spots, SPOTONE identified only 5 noninterface PPI-hot spots, whereas FTMap did not predict any. For example, the complex structure of interleukin-4 bound to interleukin-4 receptor subunit α (PDB: 1IAR)⁶² in Figure 1 revealed 3 interface PPI-hot spots (E13, R92 and N93) and 2 noninterface ones (K127 and Y128). Based on the free structure of interleukin-4 (PDB: 1BBN),⁶³ PPI-Hotspot^ID identified all 5 true positives, SPOTONE detected an interfacial PPI-hot spot (N93), whereas FTMAP failed to identify any true positives.

(Top left) The X-ray structure (PDB: 1IAR) of interleukin-4 **(grey)** in complex with interleukin-4 receptor subunit alpha (wheat) with five PPI-hot spots; interface PPI-hot spots (E13, R92, and N93) are in blue and the non-interface ones (K127 and Y128) are in green. The PPI-hot spot numberings are based on the interleukin-4 free structure (PDB: 1BBN). The correct predictions of PPI-hotspot^ID (top right), FTMap (bottom left) and SPOTONE (bottom right) are mapped to the corresponding residues of the complex structure.

Performance of AlphaFold-Multimer, PPI-Hotspot^ID and Their Combination in Predicting PPI-Hot Spots.

To assess the possibility of detecting PPI-hot spots using the interface residues predicted by AlphaFold-Multimer⁶⁰ as PPI-hot spots when complex structures are unavailable, we focused on 48 “unsolved” AB complex structures involving 47 proteins in the PPI-Hotspot+PDB^BM(1.1), as one of the proteins, human neurotrophin (UniProtID P20783, PDB 1nt30A) interacted with two different proteins (UniProtID Q16288 and P17643). These 48 unsolved complex structures contain 90 PPI-hot spots and 45 nonhot spots. We employed the protein A structure sequence from the PPI-Hotspot+PDB^BM(1.1) and the entire protein B sequence from UniProtKB⁵⁸ as inputs for the AlphaFold-Multimer module in ColabFold.⁶⁴ This generated model structures for each AB complex. Interface residues were defined based on the AMBER-relaxed model structure with the highest pTM score using a cutoff distance of 5 Å reflecting residues in close contact. Interface residues were predicted as PPI-hot spots and noninterface residues as nonhot spots.

In identifying PPI-hot spots using PPI-Hotspot^ID, we first excluded 90 true PPI-hot spots and 45 nonhot spots belonging to 47 proteins lacking complex PDB structures from our dataset. We then used an automatic machine-learning framework to train an ensemble of machine learning models using 4 features (k^C, aa residue type, SASA_i, and ) on the true PPI-hot spots and nonhot spots in the remaining 111 proteins in our dataset. The final ensemble model was used to identify PPI-hot spots in the 47 proteins lacking complex structures in our dataset. The resulting sensitivity (0.58) and F1 score (0.66) were lower than those in Table 1 using the full dataset. Nevertheless, they were greater than those achieved using AlphaFold-Multimer-predicted interface residues as PPI-hot spots (0.41). When we combined the PPI-Hotspot^ID- predicted PPI-hot spots with the AlphaFold-Multimer-predicted interface residues, the resulting sensitivity (0.70) and F1 values (0.72) were higher than those obtained by each method alone. This indicates that PPI-Hotspot^ID can identify true PPI-hot spots that reside outside the protein-protein interface.

Experimental Verification of PPI-Hotspot^ID’s Predictions in Human eEF2.

We experimentally verified predictions made by PPI-Hotspot^ID by using it to detect the PPI-hot spots of eEF2, an essential translation factor that hydrolyzes GTP to catalyze peptide elongation. Binding of cytoplasmic polyadenylation element-binding protein-2 (CPEB2) to eEF2 may interfere with conformational changes of eEF2 on ribosomes, thereby affecting the efficiency of eEF2-mediated GTP hydrolysis, and slowing down translation of hypoxiainducible factor (HIF)-1a mRNA.⁶⁵ No eEF2-CPEB2 complex structure has been solved, but a 5-Å electron microscopy structure of eEF2 (PDB 4v6x-A)⁶⁶ is available. Using the CPEB2 N-terminus for a yeast two-hybrid screen, a positive clone containing the eEF2 residues 717–803 had been identified and subsequent co-IP assay revealed a CPEB2-binding domain comprising eEF2 residues 743–817.65 Thus, we focused on this domain, which shares ≤20% sequence identity with the 158 nonredundant proteins in our dataset, in predicting PPI-hot spots. Based on the free eEF2 structure (PDB 4v6x-A),⁶⁶ PPI-hotspot^ID predicted F794 as the PPI-hot spot with the highest probability of 0.67. So, we chose to test F794 and 7 other predicted PPI-hot spots (L763, R767, G768, G778, T779, R801, A808) that were > 12 Å from F794, as well as 4 predicted PPI-nonhot spots (E773, P789, V790, Q807).

To validate PPI-Hotspot^ID’s predictions, we mutated the aforementioned predicted PPI-hot spots and PPI-nonhot spots in mouse eEF2 (meEF2), which shares 99% sequence identity with human eEF2. The generated eEF2 mutants (L763A, ⁷⁶⁶AAA⁷⁶⁸, E773Q, ⁷⁷⁸AAA⁷⁸⁰, ⁷⁸⁹AA⁷⁹⁰, F794A, R801A, Q807E, A808S, and D815A) along with wild-type eEF2 and negative control (enhanced green fluorescent protein, EGFP), were then screened for interaction with CPEB2 by co-immunoprecipitation (co-IP). This assay identified F794 as a critical eEF2 residue for binding to CPEB2. To confirm the initial screening result, we selected 3 mutants (⁷⁷⁸AAA⁷⁸⁰, F794A, and D815A designated as mut1, mut2, and mut3) for further analysis (Figure 2a). The interaction of wild-type and mutant eEF2 with CPEB2 was analyzed again by reciprocal co-IP. The results in Figure 2b show that the F794A mutation (mut2) abolished binding to CPEB2.

Evaluation of the predicted CPEB2-interacting amino acid residues in eEF2. (a) Salient features of mouse eEF2, showing the various domains and the mutated amino acids in domain V. mut 1, G778A, T779A, and P780A; mut 2, F794A; mut 3, D815A. (b) Reciprocal co-IP. The 293T cells expressing myc-CPEB2 along with wt or mutant flag-eEF2 or control GFP were harvested and then precipitated with flag or myc IgG. The precipitated substances were used for western blotting with myc and flag antibodies. IP, immunoprecipitation; IB, immunoblotting; IgG H.C., IgG heavy chain. (c) HeLa cells transfected with the plasmid expressing shRNA against human eEF2 (siheEF2) were harvested after 4-day puromycin selection for western blotting. HeLa cells transfected with the eEF2 knockdown plasmid along with flag-tagged wt or mutant mouse eEF2 after 4-day selection with puromycin and G418 were used for (d) S³⁵-met/cys-labeling of synthesized proteins or (e) western blotting with the denoted antibodies. The normalized HIF-1α protein level (HIF-1α/β-actin signal) was calculated and expressed as mean ± SEM from three independent experiments. 2-tailed Student’s t-test, * < 0.05.

Next, we investigated whether disrupting the association between CPEB2 and eEF2 affects HIF-1α expression in vivo. Because eEF2 is an essential and abundant translational factor, its ectopic expression alone was insufficient to override the function of endogenous eEF2. Thus, we tested the F794A mutant under knockdown of endogenous eEF2 condition. HeLa cells were transfected with plasmids lacking (siCtrl) or containing a short hairpin sequence for human eEF2 (siheEF2) and subjected to puromycin selection. Both shRNA sequences, specifically knocking down human but not mouse eEF2, decreased endogenous eEF2 after 4 days (Figure 2c). HeLa cells were then transfected with the siheEF2 and flag-meEF2 (wild-type, mut2, or mut3) plasmids and subjected to puromycin and G418 selection for four days. Cells that survived were incubated with S³⁵-methionine/cysteine to metabolically label synthesized proteins. The expression of the F794A or D815A mutant did not affect general protein synthesis (Figure 2d). However, the level of HIF-1a, but not CPEB2 or β-actin, was selectively increased in HeLa cells reconstituted with the F794A mutant (Figure 2e). Supplementary Figure S1 shows the uncropped immunoblot images. Thus, the eEF2 F794A mutation influences the translation of CPEB2-targeted HIF-1α mRNA without affecting general translation function.

Discussion

Identifying PPI-hot spots is challenging especially when the complex structure is lacking. A key hurdle is the lack of experimental data on PPI-hot spots, which hampers the training of accurate machine-learning models for their prediction. Here, we have introduced two novel elements that have helped to identify PPI-hot spots using the unbound structure. First, we have constructed a dataset comprising 414 experimentally known PPI-hot spots and 504 nonhot spots, and carefully checked that PPI-hot spots have no mutations resulting in ΔΔG^bind < 0.5 kcal/mol, whereas nonhot spots have no mutations resulting in ΔΔG^bind ≥ 0.5 kcal/mol or impact binding in immunoprecipitation or GST pull-down assays (see Methods). In contrast, SPOTONE⁵³ employed nonhot spots defined as residues that upon alanine mutation resulted in ΔΔG^bind < 2.0 kcal/mol. Notably, previous PPI-hot spot prediction methods did not employ PPI-hot spots whose mutations have been curated to significantly impair/disrupt PPIs in UniProtKB (see Introduction). Second, we have introduced novel features derived from unbound protein structures such as the gas-phase energy of the target protein relative to its unfolded state. The importance test results indicated the gas-phase energy as an important feature. This finding can be rationalized by considering how PPI-hot spots make significant contributions to the overall binding free energy, ΔG^bind. PPI-hot spots can enhance favorable enthalpic contributions to the ΔG^bind through hydrogen bonds or vdW contacts across the protein’s interface. This makes them energetically unstable in the absence of the protein’s binding partner and solvent, hence the gas-phase energy was found to be an important input feature. Alternatively, PPI-hot spots can counteract unfavorable entropic loss upon protein binding by maintaining an optimal binding scaffold, hence they are energetically stable.

Methods that rely on complex structures generally predict residues that make multiple contacts across the protein-protein interface as PPI-hot spots. Some of these methods assume that PPI-hot spots are exclusively located at the interface and aim to spot them among the interface residues.⁴⁰ In contrast, PPI-Hotspot^ID leverages evolutionary conservation, residue type, and stability principles based on the free protein structure to detect PPI-hot spots, including those lacking direct contact with the partner protein. Such noninterface PPI-hot spots may serve to maintain an optimal scaffold for protein binding, and are not uncommon: from our analysis of the 243 true PPI-hot spots in proteins with the complex structures, we found 67 “noninterface” PPI-hot spots with no hydrogen bonds and/or vdW contacts across the protein interface. PPI-Hotspot^ID identified 43 of these 67 noninterface PPI-hot spots. An illustrative

example is the binding of Golgi resident protein (GCP60) with phosphatidylinositol 4-kinase β (PI4K—β). PPI-Hotspot^ID correctly predicted all 4 experimentally known GCP60 PPI-hot spots including F19 and Y46, which do not form hydrogen bonds across the interface with PI4K—β (Figure 3). These results highlight the ability of PPI-Hotspot^ID to identify PPI-hot spots involved in indirect interactions with partner proteins.

(Top) The structure (PDB 2N73)⁸⁴ of GCP60 (green) in complex with PI4K-b (cyan) with the GCP60—PI4K-b interface encircled. (Bottom) The four experimentally known PPI-hot spots of GCP60 are shown in red. H45 and Y49 form hydrogen bonds across the interface with PI4K-b. Although F19 and Y46 do not directly contact PI4K-b, F19 is in vdW contact with Q42, which in turn forms vdW contacts with H45, whereas Y46 is in vdW contact with both H45 and Y49.

Proteins typically interact with multiple partners, but their PPI-hot spots may have been experimentally characterized for only a few partners. In some cases where PPI-Hotspot^ID predicted residues that were absent in the PPI-Hotspot+PDB^BM(1.1) as PPI-hot spots, the protein’s complex structures with other binding partners show intermolecular hydrogen bonds between PPI-Hotspot^ID-predicted residues and residues of the respective partner proteins. This suggests that some of the PPI-Hotspot^ID-predicted residues might be potential PPI-hot spots for other binding partners. For example, the death domain of CRADD (caspase-recruitment domain and death domain-containing adaptor protein) contains 7 experimentally known PPI-hot spots (N121, Q125, Y146, R147, K149, V156, Q169) critical for its interaction with PIDD (p53-induced death domain-containing protein). Based on the free crystal structure of CRADD (PDB 2O71-A),⁶⁷ PPI-Hotspot^ID correctly predicted 3 true positives (Y146, R147, and Q169) as well as G128. In the oligomeric structure (PDB 2OF5)⁶⁸ of 7 CRADD proteins in complex with 5 PIDD proteins, G128 shows no hydrogen-bonding interactions, but its neighbor, L127, forms a backbone---side chain hydrogen bond with R147 in another CRADD chain (Figure 4). A positively charged G128R mutation would repel the nearby positively charged R147 in another CRADD chain, thus disrupting the CRADD—CRADD interface and decreasing CRADD’s affinity for PIDD. Experimental data showed that the G128R CRADD mutant did not co-immunoprecipitate the PIDD death domain, and patients who have non-syndromic mental retardation possess the G128R mutant.⁶⁹ Thus, PPI-Hotspot^ID could unveil a PPI-hot spot, G128, that is not apparent from the 2OF5 complex structure: although G128 does not directly interact with PIDD, its mutation, especially to an Arg, might perturb the CRADD—CRADD interface and thus CRADD’s oligomeric structure and binding affinity for PIDD.

(Left) The structure (PDB 2OF5)⁶⁸ of 7 CRADD proteins in complex with 5 PIDD proteins. The circle shows the CRADD—CRADD interface between chains C (cyan) and G (orange), whereas the other 5 CRADD chains are in gray, and the 5 PIDD proteins are in green. (Right) G128 (red) in CRADD (chain C) participates indirectly in CRADD—CRADD interactions via a backbone—side chain hydrogen bond between its neighbor, L127, and R147 in another CRADD (chain G).

The ability of PPI-Hotspot^ID to detect PPI-hot spots provides biologists with a useful tool, as alanine-scanning mutagenesis and protein-protein complex structure determination to identify PPI-hot spots are laborious, time-consuming, and costly. Conventional methods based on complex structures might miss nonobvious PPI-hot spots with no direct interactions with the protein’s partner. AlphaFold-multimer and future improved protein-protein complex prediction methods require knowledge of interacting partners and independent calculations for each known partner, which reduces the overall efficiency. Moreover, solved/modeled proteinprotein complex structures only reveal the interface residues. In contrast, PPI-Hotspot^ID can reveal nonobvious PPI-hot spots as well as potential PPI-hot spots for other protein partners, thus helping to elucidate the different PPI mechanisms.

Methods

Dataset. True PPI-hot spots.

We updated the PPI-Hotspot+PDB^BM benchmark by removing 2 fused protein structures and adding new PPI-hot spots by (i) reviewing references in ASEdb⁵⁵ to include nonalanine mutations with ΔΔG^bind > 2 kcal/mol, and (ii) checking the experimental data of certain mutations in UniProtKB.⁵⁸ For example, the PPI-Hotspot+PDB^BM benchmark included R43A in aprataxin (UniProtlD Q7Z2E3), annotated as “loss of interaction with MDC1”, but not K52A, annotated as “impairs interaction with MDC1”. However, when we checked the experimental data in the UniProtKB reference, the binding bands were absent for both R43A and K52A mutants, therefore we added K52A as a PPI-hot spot. The updated benchmark, termed PPI-Hotspot+PDB^BM(1.1)), contains 414 PPI-hot spots. Among these, 104 PPI-hot spots in 32 nonredundant proteins are based on mutations resulting in ΔΔG^bind ≥ 2 kcal/mol from ASEdb⁵⁵ and SKEMPI2.0⁵⁶ with no known mutations resulting in ΔΔG^bind < 0.5 kcal/mol. The remaining 310 PPI-hot spots in 128 nonredundant proteins are based on mutations that are manually curated in UniProtKB⁵⁸ to significantly impair/disrupt PPIs. Two of the proteins have PPI-hot spots from ASEdb/SKEMPI2.0 and UniProtKB, resulting in a total of 158 nonredundant proteins with free structures harboring 414 PPI-hot spots (Supplementary Tables S1 and S2).

True PPI-nonhot spots.

To obtain PPI-nonhot spots for the 158 nonredundant proteins with true PPI-hot spots, we identified residues from ASEdb⁵⁵ and SKEMPI2.0⁵⁶ databases where mutations to alanine/nonalanine resulted in protein ΔΔG^bind < 0.5 kcal/mol. We also identified residues in the UniProtKB where mutations to alanine/nonalanine were curated not to perturb PPIs. We manually checked each reference to ensure that mutations of these residues did not lead to ΔΔG^bind changes ≥ 0.5 kcal/mol or impact binding in immunoprecipitation or GST pull-down assays. PPI-nonhot spots in non-native proteins or regions with missing structures were excluded. Supplementary Table S1 lists the 504 PPI-nonhot spots found in 75 proteins with free structures.

Input features.

To distinguish PPI-hot spots from PPI-nonhot spots, we input sequence, structural, and stability features of each residue in the protein for training various machine learning classifiers. The input features for each residue i of a protein included its aa type, conservation score, secondary structure, SASA, gas-phase energy, and respective components, polar solvation free energy, and nonpolar solvation free energy. The secondary structure, SASA, and energy components of each residue were computed using the DSSP program,⁷⁰ FreeSasa⁷¹ and AmberTools version 20,⁷² respectively, using default parameters.

Per-Residue Free Energy Contributions.

For a given free protein structure, the Reduce program⁷³ was used to add hydrogens and assign the protonation states of ionizable residues. Additional missing heavy and hydrogen atoms were added using the AmberTools version 20⁷² and the Amber FF19SB forcefield.⁷⁴ To eliminate any steric clashes, we performed a conjugate gradients minimization with constraints on the heavy atoms using the Generalized Born model for 500 steps. The resulting structure was used to compute the per-residue energy/free energy contributions using the MMPBSA (Molecular Mechanics Poisson-Boltzmann Surface Area) module in AmberTools.⁷² For each residue i in the protein, we computed the (i) molecular mechanics energy , where includes contributions from bonded terms, is the vdW interaction energy, and is the electrostatic interaction energy as well as (ii) the polar () and nonpolar () solvation free energies relative to the corresponding values of residue i in an extended reference state where the residues do not interact with one another.⁷⁵

Per-Residue Conservation Score.

To calculate the conservation score, , of residue i in a protein, we implemented a method similar to ConSurf^76,77 to run in parallel with the energy evaluation code. First, we searched the UNIREF-90 database⁷⁸ using HMMER⁷⁹ to find sequences similar to the target sequence. Near-duplicates were removed by clustering matched sequences with ≥95% pairwise sequence identity using CD-hit⁸⁰ and keeping only one representative. Since HMMER⁷⁹ may only find good matches for a small proportion of the target sequence, we compared the HMMER sequences with the target sequence. We kept only those with > 60% overlap with the target sequence, and discarded sequences that were dissimilar (≤ 35% sequence identity) or nearly identical (≥ 95% sequence identity). Next, we pairwise aligned the remaining sequences, and if two sequences overlapped by > 10% of the sequence, we rejected the shorter sequence. After this filtering process, the resulting HMMER hits were used or if the number of hits exceeded 300, we selected the top 300 hits. These sequences were then aligned to the target sequence using MAFFT-LINSi.⁸¹ We then used the Rate4Site program⁸² to compute position-specific evolutionary rates from the generated multiple sequence alignment. These rates were normalized and grouped into ConSurf grades ranging from 1 to 9, where k^C = 1 represents the most rapidly evolving residues, and k^C = 9 indicates the most conserved residues.

Generating PPI-hot spot Predictive Model using AutoGluon.

We provided all the aforementioned residue features including the conservation score, , aa type, DSSP secondary structure, SASAi, , , , , , and to the Tabular module in AutoGluon v0.8.2 (https://auto.gluon.ai/stable/index.html). AutoGluon was chosen for model training and validation due to its robustness and user-friendly interface, allowing for the simultaneous and automated exploration of various machine-learning approaches and their combinations. Instead of using a single training set to train the model and a separate test set to evaluate its performance, we employed cross-validation, as it utilizes the entire dataset for both training and testing, making efficient use of the limited data on PPI-hot spots and PPI-nonhot spots. AutoGluon-Tabular automatically chose a random partitioning of our dataset into multiple subsets/folds for training and validation. Notably, the training and validation data share insignificant homology, as the average pairwise sequence identity in our dataset is 26%. Each fold was used once as a test set, while the remaining folds served as the training set. For each test set, the model’s performance was measured using the F1 score.

AutoGluon then trained individual “base” models, including LightGBM, CatBoost, XGBoost, random forests, extremely randomized trees, neural networks, and K-nearest neighbours. Using the aggregated predictions of the base models as features in addition to the original features, AutoGluon trained multiple “stacker” models, whose predictions were fed as inputs to additional higher layer stacker models in an iterative process called multi-layer stacking. The output layer used ensemble selection to aggregate the predictions of the stacker models. To improve stacking performance, AutoGluon used all the data for both training and validation through repeated k-fold bagging of all models at all layers of the stack, where k is determined by best precision. We refer the reader to the original study by Erikson et al.,⁸³ which provides details on the methodology including the types of “base” models, multi-layer stack ensembling, and repeated k-fold bagging. Based on the highest mean F1 score, AutoGluon yielded a final PPI-hot spot predictive model that is a weighted regularized ensemble comprising more than a dozen different models (https://auto.gluon.ai/dev/api/autogluon.tabular.models.html).

Selecting Key Features.

Next, we evaluated the importance of each feature by performing a permutation-based test (part of the AutoGluon package), in which a feature in a column was randomly shuffled across different residues (rows), and the F1 score was evaluated. The importance test results revealed the four most important residue features, which in order of their importance, are (i) k^C, (ii) aa residue type, (iii) SASA_i, and (iv) . These four features were used to train an ensemble of machine-learning models using the entire data set, consisting of 414 true PPI-hot spots and 504 nonhot spots. The resulting PPI-hot spot prediction model, named PPI-Hotspot^ID, yielded a F1 score comparable to the F1 score obtained using the initial set of 10 features. PPI-Hotspot^ID was implemented as a freely accessible web server (https://github.com/wrigjz/ppihotspotid) with access to 4 virtual CPUs and 8 GB of memory. Calculations for a 539-residue protein (PDB 1c2bA) took 35 minutes.

Detecting PPI-hot spots Using the AlphaFold-Multimer-predicted Interface.

In cases where experimental complex structures are unavailable, can the protein—protein complexes modeled by AlphaFold-Multimer⁶⁰ be used to identify PPI-hot spots using the predicted interface residues? To address this, we first identified PPI-hot spots within the PPI-Hotspot+PDB^BM(1.1) dataset that lack experimentally determined protein complex structures. Not all the 414 PPI-hot spots in the PPI-Hotspot+PDB^BM(1.1) have sequence information and thus UniProtID of the respective binding partners (see Supplementary Table S1), leaving 360 PPI-hot spots in 135 proteins associated with 155 pairs of PPIs, as some proteins are involved in multiple PPIs. Ninety of the 155 PPI pairs have complex structures in the PDB. For the 65 PPI pairs lacking complex structures, 17 pairs contain > 1,100 residues, exceeding the current size limit of AlphaFold-Multimer.⁶⁰ Thus, we generated structural models for the remaining 48 complexes using the AlphaFold-Multimer module in the ColabFold version 1.3.0⁶⁴ with default settings. For each AB complex, the input sequence for protein A was based on the free structure sequence in the PPI-Hotspot+PDB^BM(1.1), whereas that for protein B was retrieved in its entirety from the UniProtKB,⁵⁸ as the binding region in protein B was unknown. Based on the AMBER-relaxed model structure with the highest pTM score, interface residues were defined as residues of protein A with ≥1 atom within a 5 Å cutoff of any protein B atom.

Experimental Verification of Predicted eEF2 PPI-hot spots. Plasmid construction.

The predicted PPI-hot spots and PPI-nonhot spots were mutated by QuikChange Site-Directed Mutagenesis Kit (Stratagene). The pcDNA3.1-flag-meEF2 plasmid was used as the PCR template and the sets of sense and antisense primers for mutagenesis are listed in Supplementary Figure S2. All constructs were sequenced to confirm the mutations. The shRNA clones, #1 TRCN0000047908 (GCGATCATGAATTTCAAGAAA) and #2: TRCN0000047910 (GCAGTACCTCAACGAGATCAA) against human eEF2 mRNA were obtained from the RNAi Core Facility (Academia Sinica).

Testing eEF2-CPEB2 interactions using co-immunoprecipitation (co-IP) and reciprocal co-IP.

HEK-293T cells obtained from American Type Culture Collection (ATCC, # CRL-3216) were cultured in DMEM with 10% fetal bovine serum (FBS). For reciprocal co-IP, the 8 μg DNA mixture containing 3 μg myc-CPEB2 and 5 μg flag-eEF2 (or a negative control, GFP) plasmids was transfected into a 10-cm dish of 293T cells using lipofectamine 2000. We transfected more flag-eEF2 plasmid DNA than myc-CPEB2 plasmid because myc-CPEB2 is expressed more abundantly than flag-eEF2. Overnight transfected cells were lysed in 500 μ1 IP buffer (20 mM Hepes, pH 7.4, 100 mM NaCl, 1 mM MgCl2, 0.1% TritonX-100, 10% glycerol, 0.5 mM DTT, 1X protease inhibitor cocktail and 100 μg/ml RNaseA) and centrifuged at 10,000 xg for 3 min at 4°C. The supernatant was equally divided and incubated with Protein G beads bound with myc or flag antibody for 3 hours at 4°C to respectively pull down myc-CPEB2 and flag-eEF2. The beads were washed five times with 300 μl IP buffer. If myc-CPEB2 and flag-eEF2 interact, myc-CPEB2 can co-precipitate with flag-eEF2 on flag antibody-beads, whereas flag-eEF2 can co-precipitate with myc-CPEB2 on myc antibodybeads. GFP was used as a negative control to ensure that the signals on the beads were caused by binding between flag-eEF2 and myc-CPEB2. The precipitated proteins were separated on a sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE) for western blot analysis. Similarly, for the initial co-IP screening, the 4 μg DNA mixture containing 1.5 μg myc-CPEB2 and 2.5 μg flag-eEF2 (or a negative control, EGFP) plasmids was transfected into a 6-cm dish of 293T cells, harvested in 200 μ1 IP buffer and immunoprecipitated by using flag antibody-bound beads.

Functional impact of eEF2 mutants on HIF-1a and global protein synthesis.

HeLa cells obtained from ATCC (# CCL-2) were cultured in DMEM with 10% FBS. Each 6-cm plate of HeLa cells was transfected with 2 μg human eEF2 knockdown plasmid and 2 μg flag-meEF2 wild-type/mutant plasmid. Overnight transfected cells were selected with 0.5 μg/ml puromycin and 600 μg/ml G418 for three days to knock down endogenous eEF2 and maintain the expression of flag-meEF2, respectively. The selected cells were incubated with 10 μM MG132 for 4 hours before harvesting for western blotting of HIF-1a or replaced with 2 ml Met/Cys-lacking DMEM with 1% FBS and 60 μCi³⁵S-Met/Cys (Perkin Elmer, cat# NEG772002MC) for 2 hours before separation on an SDS-PAGE.

Acknowledgements

This research was supported by Academia Sinica (AS-IA-107-L03) and the Ministry of Science and Technology (MOST-98-2113-M-001-011), Taiwan.

Competing interests

The authors declare no competing interests.

Supplementary Information

A web server to perform PPI-hotspot predictions is available at https://ppihotspotid.limlab.dnsalias.org/. The PPI-Hotspot^ID program is available at https://github.com/wrigjz/ppihotspotid/.

Supplementary Table S1. File listing the UniProt codes of the PPI-hot spot-containing protein and its binding partner, the PDB code(s) and chain of the free protein structure, the UniProt and PDB numbering of the PPI-hot spot, the wild-type → mutant residue, the corresponding protein binding free energy change and source given by the PubMed reference number, and the PPI-Hotspot^ID assignments where P indicates PPI-hot spot and N indicates nonhot spot.

Supplementary Table S1. File listing the UniProt codes of the PPI-hot spot-containing protein A and binding partner protein B, the PDB code-chain and length of the free and bound protein A structures, the PDB code-chain of the bound protein B structure, the sequence identity between free and bound protein A structures, and the PPI-hot spots of protein A.

Additional Declarations:

The authors declare no competing interests.

Supplementary Information

Uncropped immunoblot images. The uncropped images for Figures 2b, 2c, and 2e are shown with indicated molecular weight marker.

Uncropped immunoblot images of the entire membranes for Figure 2b.

Uncropped images of the entire membrane for Figure 2c and phosphoimager file for 2d.

Uncropped immunoblot images of the cut membranes for Figure 2e.

Primer sequences. The sequences of sense (forward) and antisense (reverse) primers used for site-directed mutagenesis.

References

1.
1. David A.
2. Razali R.
3. Wass M.N.
4. Sternberg M.J.E.
2012Protein—protein interaction sites are hot spots for disease-associated nonsynonymous SNPsHuman Mutat 33:359–363Google Scholar
2.
1. Nero T.L.
2. Morton C.J.
3. Holien J.K.
4. Wielens J.
5. Parker M.W.
2014Oncogenic protein interfaces: small molecules, big challengesNat. Rev. Cancer 14:248–262Google Scholar
3.
1. Blazer L.L.
2. Neubig R.R.
2009Small molecule protein—protein interaction inhibitors as CNS therapeutic agents: current progress and future hurdlesNeuropsychopharmacology 34:126–141Google Scholar
4.
1. Cukuroglu E.
2. Engin H.B.
3. Gursoy A.
4. Keskin O.
2014Hot spots in protein—protein interfaces: Towards drug discoveryProg. Biophys. Mol. Biol 116:165–173Google Scholar
5.
1. Rosell M.
2. Fernandez-Recio J.
2018Hot-spot analysis for drug discovery targeting protein-protein interactionsExpert Opin. Drug Discov 13:327–338Google Scholar
6.
1. Clackson T.
2. Wells J.A.
1995A hot spot of binding energy in a hormone-receptor interfaceScience 267:383–386Google Scholar
7.
1. Bogan A.A.
2. Thorn K.S.
1998Anatomy of hot spots in protein interfacesJ. Mol. Biol 280:1–9Google Scholar
8.
1. DeLano W.L.
2002Unraveling hot-spots in binding interfaces: progress and challengesCurr. Opin. Struct. Biol 12:14–20Google Scholar
9.
1. Li X.
2. Keskin O.
3. Ma B.
4. Nussinov R.
5. Liang J.
2004Protein-protein interactions: hot spots and structurally conserved residues often locate in complemented pockets that pre-organized in the unbound states: implications for dockingJ. Mol. Biol 344:781–795Google Scholar
10.
1. Keskin O.
2. Ma B.Y.
3. Nussinov R.
2005Hot regions in protein-protein interactions: the organization and contribution of structurally conserved hot spot residuesJ. Mol. Biol 345:1281–1294Google Scholar
11.
1. Moreira I.S.
2. Fernandes P.A.
3. Ramos M.J.
2007Computational alanine scanning mutagenesis - An improved methodological approachJ. Comput. Chem 28:644–654Google Scholar
12.
1. Fischer T.B.
2. et al.
2003The binding interface database (BID): a compilation of amino acid hot spots in protein interfacesBioinformatics 19:1453–1454Google Scholar
13.
1. Chen Y.C.
2. Chen Y.-H.
3. Wright J.D.
4. Lim C.
2022PPI-HotspotDB: Database of Protein—Protein Interaction Hot SpotsJ. Chem. Inf. Model 62:1052–1060Google Scholar
14.
1. Rosário-Ferreira N.
2. Bonvin A.M.
3. Moreira I.S.
2022Using machine-learning-driven approaches to boost hot-spot’s knowledgeWiley Interdiscip. Rev. Comput. Mol. Sci :e1602Google Scholar
15.
1. Massova I.
2. Kollman P.A.
1999Computational alanine scanning to probe proteinprotein interactions: A novel approach to evaluate binding free energiesJ. Am. Chem. Soc 121:8133–8143Google Scholar
16.
1. Huo S.
2. Massova I.
3. Kollman P.A.
2002Computational alanine scanning of the 1: 1 human growth hormone—receptor complexJ. Comput. Chem 23:15–27Google Scholar
17.
1. Guerois R.
2. Nielsen J.E.
3. Serrano L.
2002Predicting changes in the stability of proteins and protein complexes: a study of more than 1,000 mutationsJ. Mol. Biol 320:369–387Google Scholar
18.
1. Kortemme T.
2. Baker D.
2002A simple physical model for binding energy hot spots in protein—protein complexesProc. Natl. Acad. Sci. USA 99:14116–14121Google Scholar
19.
1. González-Ruiz D.
2. Gohlke H.
2006Targeting protein-protein interactions with small molecules: challenges and perspectives for computational binding epitope detection and ligand findingCurr. Med. Chem 13:2607–2625Google Scholar
20.
1. Grosdidier S.
2. Fernández-Recio J.
2008Identification of hot-spot residues in proteinprotein interactions by computational dockingBMC Bioinfo 9:447–459Google Scholar
21.
1. Yogurtcu O.N.
2. Erdemli S.B.
3. Nussinov R.
4. Turkay M.
5. Keskin O.
2008Restricted mobility of conserved residues in protein-protein interfaces in molecular simulationsBiophys. J 94:3475–3485Google Scholar
22.
1. Barlow K.A.
2. et al.
2018Flex ddG: Rosetta ensemble-based estimation of changes in protein—protein binding affinity upon mutationJ. Phys. Chem. B 122:5389–5399Google Scholar
23.
1. Ibarra A.A.
2. et al.
2019Predicting and experimentally validating hot-spot residues at protein-protein interfacesACS Chem. Biol 14:2252–2263Google Scholar
24.
1. Darnell S.J.
2. Page D.
3. Mitchell J.C.
2007An automated decision-tree approach to predicting protein interaction hot spotsProteins 68:813–823Google Scholar
25.
1. Cho K.-i.
2. Kim D.
3. Lee D.
2009A feature-based approach to modeling protein—protein interaction hot spotsNucleic Acids Res 37:2672–2687Google Scholar
26.
1. Assi S.A.
2. Tanaka T.
3. Rabbitts T.H.
4. Fernandez-Fuentes N.
2010PCRPi: presaging critical residues in protein interfaces, a new computational tool to chart hot spots in protein interfacesNucleic Acids Res 38:e86Google Scholar
27.
1. Xia J.F.
2. Zhao X.M.
3. Song J.N.
4. Huang D.S.
2010APIS: accurate prediction of hot spots in protein interfaces by combining protrusion index with solvent accessibilityBMC Bioinform 11:174–187Google Scholar
28.
1. Lise S.
2. Buchan D.
3. Pontil M.
4. Jones D.T.
2011Predictions of hot spot residues at protein-protein interfaces using support vector machinesPLoS one 6:e16774Google Scholar
29.
1. Wang L.
2. Liu Z.-P.
3. Zhang X.-S.
4. Chen L.
2012Prediction of hot spots in protein interfaces using a random forest model with hybrid featuresProtein Eng. Des. Sel 25:119–126Google Scholar
30.
1. Ye L.
2. et al.
2014Prediction of hot spots residues in protein—protein interface using network feature and microenvironment featureChemom. Intell. Lab. Syst 131:16–21Google Scholar
31.
1. Munteanu C.R.
2. et al.
2015Solvent accessible surface area-based hot-spot detection methods for protein-protein and protein-nucleic acid interfacesJ. Chem. Inf. Model 55:1077–1086Google Scholar
32.
1. Melo R.
2. et al.
2016A machine learning approach for hot-spot detection at protein-protein interfacesInt. J. Mol. Sci 17:1215Google Scholar
33.
1. Moreira I.S.
2. et al.
2017SpotOn: high accuracy identification of protein-protein interface hot-spotsSci Rep 7:8007Google Scholar
34.
1. Qiao Y.
2. Xiong Y.
3. Gao H.
4. Zhu X.
5. Chen P.
2018Protein-protein interface hot spots prediction based on a hybrid feature selection strategyBMC Bioinform 19:14–29Google Scholar
35.
1. Sitani D.
2. Giorgetti A.
3. Alfonso-Prieto M.
4. Carloni P.
2021Robust principal component analysis-based prediction of protein-protein interaction hot spotsProteins: Structure, Function, and Bioinformatics 89:639–647Google Scholar
36.
1. Ovek D.
2. et al.
2022Artificial intelligence based methods for hot spot predictionCurr. Opin. Struct. Biol 72:209–218Google Scholar
37.
1. Tuncbag N.
2. Keskin O.
3. Gursoy A.
2010HotPoint: hot spot prediction server for protein interfacesNucleic Acids Res 38:W402–W406Google Scholar
38.
1. Zhu X.
2. Mitchell J.C.
2011KFC2: a knowledge-based hot spot prediction method based on interface solvation, atomic density, and plasticity featuresProteins 79:2671–2683Google Scholar
39.
1. Deng L.
2. et al.
2014PredHS: a web server for predicting protein-protein interaction hot spots by using structural neighborhood propertiesNucleic Acids Res 42:W290–W295Google Scholar
40.
1. Wang H.
2. Liu C.
3. Deng L.
2018Enhanced prediction of hot spots at protein-protein interfaces using extreme gradient boostingSci. Rep 8:14285Google Scholar
41.
1. Higa R.H.
2. Tozzi C.L.
2009Prediction of binding hot spot residues by using structural and evolutionary parametersGenet. Mol. Biol 32:626–633Google Scholar
42.
1. Zerbe B.S.
2. Hall D.R.
3. Vajda S.
4. Whitty A.
5. Kozakov D.
2012Relationship between hot spot residues and ligand binding hot spots in protein—protein interfacesJ. Chem. Inf. Model 52:2236–2244Google Scholar
43.
1. Ozbek P.
2. Soner S.
3. Haliloglu T.
2013Hot spots in a network of functional sitesPloS one 8:e74320Google Scholar
44.
1. Agrawal N.J.
2. Helk B.
3. Trout B.L.
2014A computational tool to predict the evolutionarily conserved protein—protein interaction hot-spot residues from the structure of the unbound proteinFEBS letters 588:326–333Google Scholar
45.
1. Kozakov D.
2. et al.
2015The FTMap family of web servers for determining and characterizing ligand-binding hot spots of proteinsNat. Protoc 10:733–755Google Scholar
46.
1. Ofran Y.
2. Rost B.
2007Protein-protein interaction hotspots carved into sequencesPLoS Comput. Biol 3:1169–1176Google Scholar
47.
1. Chen P.
2. et al.
2013Accurate prediction of hot spot residues through physicochemical characteristics of amino acid sequencesProteins 81:1351–1362Google Scholar
48.
1. Nguyen Q.-T.
2. Fablet R.
3. Pastor D.
2013Protein interaction hotspot identification using sequence-based frequency-derived featuresIEEE Transactions on Biomedical Engineering 60:2993–3002Google Scholar
49.
1. Huang Q.
2. Zhang X.
2016An improved ensemble learning method with SMOTE for protein interaction hot spots predictionIn: 2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) IEEE pp. 1584–1589Google Scholar
50.
1. Hu S.-S.
2. Chen P.
3. Wang B.
4. Li J.
2017Protein binding hot spots prediction from sequence only by a new ensemble learning methodAmino acids 49:1773–1785Google Scholar
51.
1. Jiang J.
2. Wang N.
3. Chen P.
4. Zheng C.
5. Wang B.
2017Prediction of protein hotspots from whole protein sequences by a random projection ensemble systemInt. J. Mol. Sci 18:E1543Google Scholar
52.
1. Liu Q.
2. Chen P.
3. Wang B.
4. Zhang J.
5. Li J.
2018Hot spot prediction in protein-protein interactions by an ensemble systemBMC Syst. Biol 12:89–99Google Scholar
53.
1. Preto A.
2. Moreira I.S.
2020SPOTONE: Hot Spots on protein complexes with extremely randomized trees via sequence-only featuresInternational Journal of Molecular Sciences 21:7281Google Scholar
54.
1. Yao S.
2. Zheng C.
3. Wang B.
4. Chen P.
2022A two-step ensemble learning for predicting protein hot spot residues from whole protein sequenceAmino Acids 54:765–776Google Scholar
55.
1. Thorn K.S.
2. Bogan A.A.
2001ASEdb: a database of alanine mutations and their effects on the free energy of binding in protein interactionsBioinformatics 17:284–285Google Scholar
56.
1. Jankauskaite J.
2. Jiménez-García B.
3. Dapkunas J.
4. Fernández-Recio J.
5. Moal I.H.
2019SKEMPI 2.0: an updated benchmark of changes in protein—protein binding energy, kinetics and thermodynamics upon mutationBioinformatics 35:462–469Google Scholar
57.
1. Wang M.
2. Zhu D.
3. Zhu J.
4. Nussinov R.
5. Ma B.
2018Local and global anatomy of antibody-protein antigen recognitionJ. Molec. Recognit 31:e2693Google Scholar
58.
1. Consortium U.
2018UniProt: the universal protein knowledgebaseNucleic Acids Res 46:2699Google Scholar
59.
1. Kozakov D.
2. et al.
2011Structural conservation of druggable hot spots in protein—protein interfacesProc. Natl. Acad. Sci. U.S.A 108:13528–13533Google Scholar
60.
1. Evans R.
2. et al.
2022Protein complex prediction with AlphaFold-MultimerbioRxiv https://doi.org/10.1101/2021.10.04.463034 Google Scholar
61.
1. Laskowski R.A.
2. Jablonska J.
3. Pravda L.
4. Vareková R.S.
5. Thornton J.M.
2018PDBsum: Structural summaries of PDB entriesProtein Sci 27:129–134Google Scholar
62.
1. Hage T.
2. Sebald W.
3. Reinemer P.
1999Crystal structure of the interleukin-4/receptor a chain complex reveals a mosaic binding interfaceCell 97:271–281Google Scholar
63.
1. Powers R.
2. et al.
1992Three-dimensional solution structure of human interleukin-4 by multidimensional heteronuclear magnetic resonance spectroscopyScience 256:1673–1677Google Scholar
64.
1. Mirdita M.
2. et al.
2022ColabFold: making protein folding accessible to allNat Methods 19:679–682Google Scholar
65.
1. Chen P.J.
2. Huang Y.S.
2012CPEB2-eEF2 interaction impedes HIF-1alpha RNA translationEMBO J 31:959–971Google Scholar
66.
1. Anger A.M.
2. et al.
2013Structures of the human and Drosophila 80S ribosomeNature 497:80–85Google Scholar
67.
1. Park H.H.
2. Wu H.
2006Crystal structure of RAIDD death domain implicates potential mechanism of PIDDosome assemblyJ. Mol. Biol 357:358–364Google Scholar
68.
1. Park H.H.
2. et al.
2007Death domain assembly mechanism revealed by crystal structure of the oligomeric PIDDosome core complexCell 128:533–546Google Scholar
69.
1. Puffenberger E.G.
2. et al.
2012Genetic mapping and exome sequencing identify variants associated with five novel diseasesPLoS One 7:e28936Google Scholar
70.
1. Kabsch W.
2. Sander C.
1983Dictionary of protein secondary structure: Pattern recognition of hydrogen-bonded and geometrical featuresBiopolymers 22:2577–2637Google Scholar
71.
1. Mitternacht S.
2016FreeSASA: An open source C library for solvent accessible surface area calculationsF1000Research S:189Google Scholar
72.
1. Case D.A.
2. et al.
2020AMBER 2020In: Amber San Francisco.: University of California Google Scholar
73.
1. Word J.M.
2. Lovell S.C.
3. Richardson J.S.
4. Richardson D.C.
1999Asparagine and glutamine: Using hydrogen atom contacts in the choice of side-chain amide orientationJ. Mol. Biol 285:1735–1747Google Scholar
74.
1. Tian C.
2. et al.
2020ff19SB: Amino-Acid-Specific Protein Backbone Parameters Trained against Quantum Mechanics Energy Surfaces in SolutionJ. Chem. Theory Comput 16:528–552Google Scholar
75.
1. Chen Y.C.
2. Wu C.Y.
3. Lim C.
2007Predicting DNA-binding amino acid residues from electrostatic stabilization upon mutation to Asp/Glu and evolutionary conservationProteins-Structure Function and Bioinformatics 67:671–680Google Scholar
76.
1. Glaser F.
2. et al.
2003ConSurf: Identification of Functional Regions in Proteins by Surface-Mapping of Phylogenetic InformationBioinformatics 19:163–164Google Scholar
77.
1. Landau M.
2. et al.
2005ConSurf 2005: the projection of evolutionary conservation scores of residues on protein structuresNucleic Acids Res 33:299–302Google Scholar
78.
1. Wu C.H.
2. et al.
2006The Universal Protein Resource (UniProt): an expanding universe of protein informationNucleic Acids Res 34:D187–191Google Scholar
79.
1. Johnson L.S.
2. Eddy S.R.
3. Portugaly E.
2010Hidden Markov model speed heuristic and iterative HMM search procedureBMC Bioinformatics 11:431Google Scholar
80.
1. Li W.
2. Godzik A.
2006Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequencesBioinformatics 22:1658–1659Google Scholar
81.
1. Nakamura T.
2. Yamada K.D.
3. Tomii K.
4. Katoh K.
2018Parallelization of MAFFT for large-scale multiple sequence alignmentsBioinformatics 34:2490–2492Google Scholar
82.
1. Pupko T.
2. Bell R.
3. Mayrose I.
4. Glaser F.
5. Ben-Tal N.
2002Rate4Site: An algorithmic tool for the identification of functional regions in proteins by surface mapping of evolutionary determinants within their homologuesBioinformatics (Oxford, England) 18 Suppl 1:S71–7Google Scholar
83.
1. Erickson N.
2. et al.
2020AutoGluon-Tabular: Robust and Accurate AutoML for Structured DataarXiv https://doi.org/10.48550/arXiv.2003.06505 Google Scholar
84.
1. Klima M.
2. et al.
2016Structural insights and in vitro reconstitution of membrane targeting and activation of human PI4KB by the ACBD3 proteinSci. Rep 6:23641Google Scholar

Article and author information

Author information

Yao Chi Chen
Institute of Biomedical Sciences, Academia Sinica, Taipei 115, Taiwan
- For correspondence: backy2010.chen@gmail.com
Karen Sargsyan
Institute of Biomedical Sciences, Academia Sinica, Taipei 115, Taiwan
- For correspondence: karen.sarkisyan@gmail.com
Jon D Wright
Institute of Biomedical Sciences, Academia Sinica, Taipei 115, Taiwan, Current address: Immunwork, Inc., C520 Building C, No.99, Lane 130, Section 1, Academia Road, Nankang District, Taipei, 11571, Taiwan
Yu-Hsien Chen
Institute of Biomedical Sciences, Academia Sinica, Taipei 115, Taiwan
Yi-Shuian Huang
Institute of Biomedical Sciences, Academia Sinica, Taipei 115, Taiwan
Carmay Lim
Institute of Biomedical Sciences, Academia Sinica, Taipei 115, Taiwan, Current address: Immunwork, Inc., C520 Building C, No.99, Lane 130, Section 1, Academia Road, Nankang District, Taipei, 11571, Taiwan
- For correspondence: carmay@gate.sinica.edu.tw

Version history

Sent for peer review: February 15, 2024
Preprint posted: February 19, 2024
Reviewed Preprint version 1: May 28, 2024
Reviewed Preprint version 2: August 7, 2024
Version of Record published: September 16, 2024

Cite all versions

You can cite all versions using the DOI https://doi.org/10.7554/eLife.96643. This DOI represents all versions, and will always resolve to the latest one.

Copyright

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

views: 3,332
downloads: 171
citations: 3

Views, downloads and citations are aggregated across all versions of this paper published by eLife.

Strength of evidence

Abstract

Introduction

Results

Evaluating the Performance of PPI-Hot Spot Detection Methods.

Performance of PPI-hotspotID vs. FTMap and SPOTONE.

Performance of the PPI-HotspotID vs. FTMap and SPOTONEa

Performance of AlphaFold-Multimer, PPI-HotspotID, and their combination for 48 “unsolved” complex structures

Interface vs. Noninterface PPI-hotspots.

Performance of AlphaFold-Multimer, PPI-HotspotID and Their Combination in Predicting PPI-Hot Spots.

Experimental Verification of PPI-HotspotID’s Predictions in Human eEF2.

Discussion

Methods

Dataset. True PPI-hot spots.

True PPI-nonhot spots.

Input features.

Per-Residue Free Energy Contributions.

Per-Residue Conservation Score.

Generating PPI-hot spot Predictive Model using AutoGluon.

Selecting Key Features.

Detecting PPI-hot spots Using the AlphaFold-Multimer-predicted Interface.

Experimental Verification of Predicted eEF2 PPI-hot spots. Plasmid construction.

Testing eEF2-CPEB2 interactions using co-immunoprecipitation (co-IP) and reciprocal co-IP.

Functional impact of eEF2 mutants on HIF-1a and global protein synthesis.

Acknowledgements

Competing interests

Supplementary Information

Additional Declarations:

Supplementary Information

Uncropped immunoblot images. The uncropped images for Figures 2b, 2c, and 2e are shown with indicated molecular weight marker.

Uncropped immunoblot images of the entire membranes for Figure 2b.

Uncropped images of the entire membrane for Figure 2c and phosphoimager file for 2d.

Uncropped immunoblot images of the cut membranes for Figure 2e.

Primer sequences. The sequences of sense (forward) and antisense (reverse) primers used for site-directed mutagenesis.

References

Article and author information

Author information

Yao Chi Chen

Karen Sargsyan

Jon D Wright

Yu-Hsien Chen

Yi-Shuian Huang

Carmay Lim

Version history

Cite all versions

Copyright

Metrics

Performance of PPI-hotspot^ID vs. FTMap and SPOTONE.

Performance of the PPI-Hotspot^ID vs. FTMap and SPOTONE^a

Performance of AlphaFold-Multimer, PPI-Hotspot^ID, and their combination for 48 “unsolved” complex structures

Performance of AlphaFold-Multimer, PPI-Hotspot^ID and Their Combination in Predicting PPI-Hot Spots.

Experimental Verification of PPI-Hotspot^ID’s Predictions in Human eEF2.