We trained a self-supervised three-dimensional convolutional neural network (CNN) to learn internal representations of protein structures by predicting wild-type amino acid labels from protein …
(A) Learning curve for training of the RaSP downstream model, with Pearson correlation coefficients () and mean absolute error () of RaSP predictions. During training we transformed the target …
The model obtained at epoch 15 achieves a classification accuracy of 63% on the validation set.
Substitutions from glycine and cysteine as well as to proline generally have higher errors.
We speculate, that the RaSP prediction task is harder in the case of buried residues because Rosetta values generally have higher variance in those regions. Pearson correlation coefficients and …
Predictions of changes in stability obtained using (A) RaSP and (B) Rosetta are compared to experimental data on five test proteins; myoglobin (1BVC), lysozyme (1LZ1), chymotrypsin inhibitor (2CI2), …
Stability predictions obtained using (A–E) RaSP and (F–J) Rosetta are compared to experimental data for the five test proteins; myoglobin (1BVC), lysozyme (1LZ1), chymotrypsin inhibitor (2CI2), …
The experimental data has been filtered to include only well-defined experimental values from single substitution mutations in natural protein domains (Tsuboyama et al., 2022). This filtered data …
We compare stability predictions with VAMP-seq scores for three test proteins (A) TPMT (PDB: 2H11) (Matreyek et al., 2018), (B) PTEN (PDB: 1D5R) (Matreyek et al., 2018) and (C) NUDT15 (PDB: 5BON) (Su…
Pearson correlation coefficients () between experimental stability measurements and predictions using protein homology models with decreasing sequence identity to the target sequence. Pearson …
The grey distribution shown in the background of all plots represents the distribution of values calculated using RaSP for all single amino acid changes in the 1,366 proteins that we analysed (15 …
The grey distribution shown in the background of all plots represents the distribution of for all single amino acid changes in the 1366 proteins that we analysed. Each plot is also labelled with …
When comparing RaSP to Rosetta (column: "Pearson || RaSP vs. Ros."), we only compute the Pearson correlation coefficients for variants with a Rosetta value in the range [–1;7] kcal/mol. …
Data set | Protein name | PDB, chain | Pearson |ρ| RaSP vs. Ros. | Pearson |ρ| RaSP vs. Exp. | Pearson |ρ| Ros. vs. Exp. |
---|---|---|---|---|---|
RaSP test set | MEN1 | 3U84, A | 0.85 | - | - |
F8 | 2R7E, A | 0.71 | - | - | |
ELANE | 4WVP, A | 0.81 | - | - | |
ADSL | 2J91, A | 0.84 | - | - | |
GCK | 4DCH, A | 0.84 | - | - | |
RPE65 | 4RSC, A | 0.84 | - | - | |
TTR | 1F41, A | 0.88 | - | - | |
ELOB | 4AJY, B | 0.87 | - | - | |
SOD1 | 2CJS, A | 0.84 | - | - | |
VANX | 1R44, A | 0.83 | - | - | |
ProTherm test set | Myoglobin | 1BVC, A | 0.91 | 0.71 | 0.76 |
Lysozyme | 1LZ1, A | 0.80 | 0.57 | 0.65 | |
Chymotrypsin inhib. | 2CI2, I | 0.79 | 0.65 | 0.68 | |
RNAse H | 2RN2, A | 0.78 | 0.79 | 0.71 | |
Protein G | Protein G | 1PGA, A | 0.90 | 0.72 | 0.72 |
MAVE test set | NUDT15 | 5BON, A | 0.83 | 0.50 | 0.54 |
TPMT | 2H11, A | 0.86 | 0.48 | 0.49 | |
PTEN | 1D5R, A | 0.87 | 0.52 | 0.53 |
Results for methods other than RaSP have been copied from Pancotti et al., 2022. We speculate that the higher RMSE and MAE values for Rosetta relative to RaSP are due to missing scaling of Rosetta …
Method | S669, direct | ||
---|---|---|---|
Pearson | RMSE [kcal/mol] | MAE [kcal/mol] | |
Structure-based | |||
ACDC-NN | 0.46 | 1.49 | 1.05 |
DDGun3D | 0.43 | 1.60 | 1.11 |
PremPS | 0.41 | 1.50 | 1.08 |
RaSP | 0.39 | 1.63 | 1.14 |
ThermoNet | 0.39 | 1.62 | 1.17 |
Rosetta | 0.39 | 2.70 | 2.08 |
Dynamut | 0.41 | 1.60 | 1.19 |
INPS3D | 0.43 | 1.50 | 1.07 |
SDM | 0.41 | 1.67 | 1.26 |
PoPMuSiC | 0.41 | 1.51 | 1.09 |
MAESTRO | 0.50 | 1.44 | 1.06 |
FoldX | 0.22 | 2.30 | 1.56 |
DUET | 0.41 | 1.52 | 1.10 |
I-Mutant3.0 | 0.36 | 1.52 | 1.12 |
mCSM | 0.36 | 1.54 | 1.13 |
Dynamut2 | 0.34 | 1.58 | 1.15 |
Pearson correlation coefficients () between RaSP predictions using either two different crystal structures or a crystal structure and an AlphaFold 2 structure for six test proteins: PRMT5 (X1: …
Protein | All [] | High AF2 pLDDT [] | Medium-Low AF2 pLDDT [] | ||||||
---|---|---|---|---|---|---|---|---|---|
X1-X2 | X1-AF2 | X2-AF2 | X1-X2 | X1-AF2 | X2-AF2 | X1-X2 | X1-AF2 | X2-AF2 | |
PRMT5 | 0.93 | 0.89 | 0.95 | - | 0.90 | 0.95 | - | 0.66 | 0.89 |
PKM | 0.99 | 0.95 | 0.95 | - | 0.95 | 0.95 | - | 0.88 | 0.89 |
FTH1 | 0.99 | 0.97 | 0.97 | - | 0.97 | 0.97 | - | 0.92 | 0.95 |
FTL | 0.97 | 0.96 | 0.97 | - | 0.96 | 0.97 | - | 0.96 | 0.94 |
PSMA2 | 0.99 | 0.95 | 0.95 | - | 0.96 | 0.96 | - | 0.78 | 0.80 |
GNB1 | 0.96 | 0.94 | 0.94 | - | 0.94 | 0.94 | - | 0.93 | 0.89 |
The RaSP model is in total 480–1,036 times faster than Rosetta. RaSP, ACDC-NN and ThermoNet computations were performed using a single NVIDIA V100 16 GB GPU machine, while Rosetta and FoldX …
Method | Protein | Wall-clock time [s] | ||
---|---|---|---|---|
Pre-processing | / residue | |||
RaSP | ELOB | 7 | 41 | 0.4 |
GCK | 11 | 173 | 0.4 | |
F8 | 20 | 270 | 0.4 | |
Rosetta | ELOB | 677 | 44,324 | 414.2 |
GCK | 7,996 | 118,361 | 272.7 | |
F8 | 17,211 | 133,178 | 192.2 | |
FoldX | ELOB | 78 | 42,237 | 394.7 |
GCK | 728 | 309,762 | 713.7 | |
F8 | 1,306 | 559,050 | 806.7 | |
ACDC-NN | ELOB | 81 | 158 | 1.5 |
GCK | 169 | 619 | 1.4 | |
F8 | 325 | 1,080 | 1.6 | |
ThermoNet | ELOB | 80,442 | 884 | 8.3 |
GCK | 4,586,522 | 4,227 | 9.7 | |
F8 | 11,627,433 | 8,732 | 12.6 |