Computational and Systems Biology

Protein language model embedded geometric graphs power inter-protein contact prediction

Yunda Si
Chengfei Yan author has email address

School of Physics, Huazhong University of Science and Technology, China

https://doi.org/10.7554/eLife.92184.2

Figures and data

Overview of PLMGraph-Inter. (a) The network architecture of PLMGraph-Inter. (b) The graph representation module. (c) The graph encoder module, s denotes scalar features, v denotes vector features. (d) The dimensional hybrid residual block (“IN” denotes Instance Normalization).

The performances of DeepHomo, GLINTER, DRN-1D2D_Inter, CDPred, DeepHomo2 and PLMGraph-Inter on the HomoPDB and HeteroPDB test sets using experimental structures (AlphaFold2 predicted structures)

The performance of PLMGraph-Inter when using different sequence identity and fold similarity thresholds to further remove potential redundancies in HomoPDB and HeteroPDB.

The performances of PLMGraph-Inter and other methods on the HomoPDB and HeteroPDB test sets. (a)∼(b): The head-to-head comparison of the precisions (%) of the top 50 contacts predicted by PLMGraph-Inter and other methods for each target in (a) HomoPDB and (b) HeteroPDB using experimental structures. (c)∼(d): The head-to-head comparison of the precisions (%) of the top 50 contacts predicted by PLMGraph-Inter and other methods for each target in (c) HomoPDB and (d) HeteroPDB using AlphaFold2 predicted structures.

The performances of PLMGraph-Inter when using experimental and AlphaFold2 predicted structures as the input. (a)∼(b): The performance comparison of PLMGraph-Inter when using experimental structures and AlphaFold2 predicted structures as the input on (a) HomoPDB and (b) HeteroPDB. (c) The performance gaps (measured as the difference of the mean precision of the top 50 predicted contacts) of PLMGraph-Inter with the application of AlphaFold2 predicted structures and experimental structures as the input when the PPIs are within different intervals of DTM-score. The upper panel shows the percentage of the total number of PPI’s in each interval. (d) The comparison of the precision of top 50 contacts predicted by PLMGraph-Inter for each target when using experimental structures and AlphaFold2 predicted structures as the input.

The ablation study of PLMGraph-Inter on the HomoPDB and HeteroPDB test sets. (a) The mean precisions of the top 50 contacts predicted by different ablation models on the HomoPDB and HeteroPDB test sets. (b) The head-to-head comparisons of mean precisions of the top 50 contacts predicted by model d and DRN-1D2D_Inter (single model) for each target in HomoPDB and HeteroPDB. (c) The head-to-head comparison of mean precisions of the top 50 contacts predicted by the model using our geometric graphs and the GVP geometric graphs.

The performances of PLMGraph-Inter and other methods on the DHTest and DB5.5 test sets. (a)∼(b): The mean precisions of top 50 contacts predicted by PLMGraph-Inter, GLINTER, DeepHomo2, CDPred and DeepHomo on (a) DHTest and (b) DB5.5 when using experimental structures and AlphaFold2 predicted structures as the input, where the green lines indicate the performance of DRN-1D2D_Inter. (c) The performance gaps (measured as the difference of the mean precision of the top 50 predicted contacts) of PLMGraph-Inter with the application of AlphaFold2 predicted structures and experimental structures as the input when the PPIs are within different intervals of DTM-score. The upper panel shows the percentage of the total number of PPI’s in each interval. (d)∼(e): The distributions of precisions of the top 50 contacts predicted by PLMGraph-Inter and other methods for PPIs in (d) DHTest and (e) DB5.5. (f) The mean precisions of the top 50 contacts predicted by PLMGraph-Inter on PPIs within different intervals of contact densities in DB5.5. The upper panel shows the percentage of the total number of PPI’s in each interval.

The comparison of PLMGraph-Inter with AlphaFold-Multimer.
(a) The head-to-head comparison between the qualities of the protein complex structures generated by AlphaFold-Multimer (evaluated with DockQ) and the precision of the top 50 inter-protein contacts extracted from the generated protein complex structures. The red horizontal lines represent the threshold (DockQ=0.23) to determine whether the complex structure prediction is successful or not. (b) The head-to-head comparisons of precisions of the top 50 inter-protein contacts predicted by PLMGraph-Inter and AlphaFold-Multimer for each target in the homomeric PPI and heteromeric PPI datasets. (c)∼(d): The mean precisions of top 50 inter-protein contacts predicted by PLMGraph-Inter and AlphaFold-Multimer on the PPI subsets from (c)“DHTest+HomoPDB” and (d) “DB5.5+HeteroPDB” in which the precision of the top 50 inter-protein contacts predicted by AlphaFold-Multimer is lower than 50% or the DockQ of the complex structure predicted by AlphaFold-Multimer is lower than 0.23 or the “iptm + ptm” of the complex structure predicted by AlphaFold-Multimer is lower than 0.5.

Protein-protein docking performances on the Homodimer and Heterodimer test sets. (a)∼(b) The protein-protein docking performance comparison between HADDOCK with and without (ab-initio) using PLMGraph-Inter predicted contacts as restraints on (a) Homodimer and (b) Heterodimer. The left side of each column shows the performance when the top 1 predicted model for each PPI is considered, and the right side shows the performance when the top 10 predicted models for each PPI are considered. (c) The head-to-head comparison of qualities of the top 1 model predicted by HADDOCK with and without using PLMGraph-Inter predicted contacts as restraints for each target PPI. The red lines represent the threshold (DockQ=0.23) to determine whether the complex structure prediction is successful or not. (d) The success rates (the top 1 model) for protein complex structure prediction when only including targets for which precisions of the predicted contacts are higher than certain thresholds.

The head-to-head comparison of the precisions (%) of the top 50 contacts predicted by PLMGraph-Inter and other methods for each target in HomoPDB and HeteroPDB using experimental structures.

The head-to-head comparison of the precisions (%) of the top 50 contacts predicted by PLMGraph-Inter and other methods for each target in HomoPDB and HeteroPDB using AlphaFold2 predicted structures.

The mean precision versus contact density for the top 50 contacts predicted by PLMGraph-Inter, GLINTER, DeepHomo, DeepHomo2, CDPred, DRN-1D2D_Inter on the HomoPDB test set (a, c) and HeteroPDB test set (b, d) using experimental structures (first row) and AlphaFold2 predicted structures (second row).

The mean precision versus log(N_eff^norm) for the top 50 contacts predicted by PLMGraph-Inter, GLINTER, DeepHomo, DeepHomo2, CDPred, DRN-1D2D_Inter on the HomoPDB test set (a, c) and HeteroPDB test set (b, d) using experimental structures (first row) and AlphaFold2 predicted structures (second row).

The comparison of PLMGraph-Inter with AlphaFold-Multimer.
(b) The head-to-head comparisons of precisions of the top 50 inter-protein contacts predicted by PLMGraph-Inter(using AlphaFold2 predicted structures) and AlphaFold-Multimer for each target in the homomeric PPI and heteromeric PPI datasets. (c)∼(d): The mean precisions of top 50 inter-protein contacts predicted by PLMGraph-Inter(using AlphaFold2 predicted structures as input) and AlphaFold-Multimer on the PPI subsets from (c) “DHTest+HomoPDB” and (d) “DB5.5+HeteroPDB” in which the precision of the top 50 inter-protein contacts predicted by AlphaFold-Multimer is lower than 50% or the DockQ of the complex structure predicted by AlphaFold-Multimer is lower than 0.23 or the “iptm + ptm” of the complex structure predicted by AlphaFold-Multimer is lower than 0.5.

3D structure of the homodimer (PDB: 3DFU).

The comparison of HADDOCK (with PLMGraph-Inter contact constraints) with AlphaFold-Multimer in protein complex structure prediction. (a) The binding configurations predicted by HADDOCK (colored orange, DockQ 0.375), predicted bys AlphaFold-Multimer (colored pink, DockQ 0) and the native binding configuration (colored blue) for the chain B of the protein complex structure in PDB 5HPS. The chain A is shown in the protein surface mode (colored green). (b)∼(c): The head-to-head comparison of qualities of the (a) top 1 or (b) top 10 model predicted by HADDOCK with using PLMGraph-Inter predicted contacts as restraints and AlphaFold-Multimer for each target PPI.

The graph representation of protein structures. (a) Dihedral angles of the protein backbone. (b) The local coordinate system of each amino acid. (c) The scalar (distances) and vector (directions) of the edge i->j.

The performances of DeepHomo, GLINTER, DRN-1D2D_Inter, DeepHomo2, CDPred and PLMGraph-Inter on HomoPDB and HeteroPDB after the removal of targets which GLINTER failed to make the prediction using experimental structures (AlphaFold2 predicted structures)

The performances of different ablation study models on the HomoPDB and HeteroPDB test sets

The performances of DeepHomo, GLINTER, DRN-1D2D_Inter, DeepHomo2, CDPred and PLMGraph-Inter on the DHTest and DB5.5 test sets using experimental structures (AlphaFold2 predicted structures)

The performances of DeepHomo, GLINTER, DRN-1D2D_Inter, DeepHomo2, CDPred and PLMGraph-Inter on DHTest and DB5.5 after the removal of targets which GLINTER failed to make the prediction using experimental structures (AlphaFold2 predicted structures)

The performances of AlphaFold-Multimer and PLMGraph-Inter on the homodimer and heterodimer test sets

Sign up for email alerts