Building a challenging training/testing set for PPIscreenML.
(A) A collection of 1,481 non-redundant active complexes with experimentally derived structures were obtained from DockGround, and five AF2 models were built from each of these. To build decoys, the same collection was screened to identify the closest structural matches (by TM-score) for each component protein. The structural homologs for each template protein were aligned onto the original complex, yielding a new (decoy) complex between two presumably non-interacting proteins. Five AF2 models were built for each of these 1,481 decoy complexes. (B) An example of a decoy complex (blue/cyan) superposed with the active complex from which is was generated (brown/wheat). (C) A suite of AlphaFold confidence metrics, structural properties, and Rosetta energy terms were used as input features for training PPIscreenML, a machine learning classifier built to distinguish active versus compelling inactive protein pairs.