Rapid protein stability prediction using deep learning representations

  1. Lasse M Blaabjerg
  2. Maher M Kassem
  3. Lydia L Good
  4. Nicolas Jonsson
  5. Matteo Cagiada
  6. Kristoffer E Johansson
  7. Wouter Boomsma  Is a corresponding author
  8. Amelie Stein  Is a corresponding author
  9. Kresten Lindorff-Larsen  Is a corresponding author
  1. Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Denmark
  2. Center for Basic Machine Learning Research in Life Science, Department of Computer Science, University of Copenhagen, Denmark
5 figures, 4 tables and 1 additional file

Figures

Overview of model training.

We trained a self-supervised three-dimensional convolutional neural network (CNN) to learn internal representations of protein structures by predicting wild-type amino acid labels from protein structures. The representation model is trained to predict amino acid type based on the local atomic environment parameterized using a 3D sphere around the wild-type residue. Using the representations from the convolutional neural network as input, a second downstream and supervised fully connected neural network (FCNN) was trained to predict Rosetta ΔΔG values.

Figure 2 with 3 supplements
Overview of RaSP downstream model training and testing.

(A) Learning curve for training of the RaSP downstream model, with Pearson correlation coefficients (ρ) and mean absolute error (MAEF) of RaSP predictions. During training we transformed the target ΔΔG data using a switching (Fermi) function, and MAEF refers to this transformed data (see Methods for further details). Error bars represent the standard deviation of 10 independently trained models, that were subsequently used in ensemble averaging. Val: validation set; Train: training set. (B) After training, we applied the RaSP model to an independent test set to predict ΔΔG values for a full saturation mutagenesis of 10 proteins. Pearson correlation coefficients and mean absolute errors (MAE) were for this figure computed using only variants with Rosetta ΔΔG values in the range [–1;7] kcal/mol.

Figure 2—figure supplement 1
Learning curve for the self-supervised 3D convolutional neural network.

The model obtained at epoch 15 achieves a classification accuracy of 63% on the validation set.

Figure 2—figure supplement 2
Mean absolute prediction error for RaSP on the validation set, split by amino acid type of the wild-type and variant residue.

Substitutions from glycine and cysteine as well as to proline generally have higher errors.

Figure 2—figure supplement 3
RaSP versus Rosetta ΔΔG values for a full saturation mutagenesis of 10 test proteins separated into either exposed (A) or buried (B) residues.

We speculate, that the RaSP prediction task is harder in the case of buried residues because Rosetta ΔΔG values generally have higher variance in those regions. Pearson correlation coefficients and mean absolute errors (MAE) were for this figure computed using only variants with Rosetta ΔΔG values in the range [–1;7] kcal/mol. Buried and exposed residue were classified based a relative surface accessible surface area (SASA) cut-off of 0.2.

Figure 3 with 4 supplements
Comparing RaSP and Rosetta predictions to experimental stability measurements.

Predictions of changes in stability obtained using (A) RaSP and (B) Rosetta are compared to experimental data on five test proteins; myoglobin (1BVC), lysozyme (1LZ1), chymotrypsin inhibitor (2CI2), RNAse H (2RN2) and Protein G (1PGA) (Kumar, 2006; Ó Conchúir et al., 2015; Nisthal et al., 2019). Metrics used are Pearson correlation coefficient (ρ), mean absolute error (MAE) and mean error (ME). In the experimental study of Protein G, 105 variants were assigned a ΔΔG value of at least 4 kcal/mol due to low stability, presence of a folding intermediate, or lack expression (Nisthal et al., 2019).

Figure 3—figure supplement 1
Comparing RaSP and Rosetta predictions to experimental stability measurements.

Stability predictions obtained using (A–E) RaSP and (F–J) Rosetta are compared to experimental data for the five test proteins; myoglobin (1BVC), lysozyme (1LZ1), chymotrypsin inhibitor (2CI2), RNAse H (2RN2) and Protein G (1PGA) (Kumar, 2006; Ó Ó Conchúir et al., 2015; Nisthal et al., 2019). In the experimental study of Protein G, 105 variants were assigned a ΔΔG value of at least 4 kcal/mol due to low stability, presence of a folding intermediate, or lack expression (Nisthal et al., 2019).

Figure 3—figure supplement 2
RaSP performance on three recently published data sets (Pancotti et al., 2022): (A) The S669 data set, (B) The Ssym+ direct data set, (C) The Ssym+ reverse data set.
Figure 3—figure supplement 3
RaSP performance on the recently published mega-scale experiments (Tsuboyama et al., 2022).

The experimental data has been filtered to include only well-defined experimental ΔΔG values from single substitution mutations in natural protein domains (Tsuboyama et al., 2022). This filtered data set contains a total of 164,524 variants across 164 protein domain structures.

Figure 3—figure supplement 4
Benchmarking RaSP and Rosetta using VAMP-seq data.

We compare stability predictions with VAMP-seq scores for three test proteins (A) TPMT (PDB: 2H11) (Matreyek et al., 2018), (B) PTEN (PDB: 1D5R) (Matreyek et al., 2018) and (C) NUDT15 (PDB: 5BON) (Suiter et al., 2020).

Stability predictions from structures created by template-based modelling.

Pearson correlation coefficients (ρ) between experimental stability measurements and predictions using protein homology models with decreasing sequence identity to the target sequence. Pearson correlation coefficients were computed in the range of [–1;7] kcal/mol.

Figure 5 with 3 supplements
Large-scale analysis of disease-causing variants and variants observed in the population.

The grey distribution shown in the background of all plots represents the distribution of ΔΔG values calculated using RaSP for all single amino acid changes in the 1,366 proteins that we analysed (15 of the 1381 proteins that we calculated ΔΔG for did not have variants in ClinVar or gnomAD and were therefore not included in this analysis). Each plot is also labelled with the median ΔΔG of the subset analysed as well as a range of ΔΔG values that cover 95% of the data in that subset (box plot shows median, quartiles and outliers). The plots only show values between –1 and 7 kcal/mol (for the full range see Figure 5—figure supplement 2). (A) Distribution of RaSP ΔΔG values for benign (blue) and pathogenic (tan) variants extracted from the ClinVar database (Landrum et al., 2018). We observe that the median RaSP ΔΔG value is significantly higher for pathogenic variants compared to benign variants using bootstrapping. (B) Distribution of RaSP ΔΔG values for variants with different allele frequencies (AF) extracted from the gnomAD database Karczewski et al., 2020 in the ranges (i) AF>10-2 (green), (ii) 10-2 > AF>10-4 (orange), and (iii) AF<10-4 (purple). We observe a gradual shift in the median RaSP ΔΔG going from common variants (AF>10-2) towards rarer ones (AF<10-4).

Figure 5—figure supplement 1
Histogram of ΔΔG values from saturation mutagenesis using RaSP on 1,366 PDB structures corresponding to ∼8.8 million predicted ΔΔG values.
Figure 5—figure supplement 2
Large-scale analysis of disease-causing variants and variants observed in the population using the RaSP model.

The grey distribution shown in the background of all plots represents the distribution of ΔΔG for all single amino acid changes in the 1366 proteins that we analysed. Each plot is also labelled with the median ΔΔG of the subset analysed as well as a range of ΔΔG values that cover 95% of the data in that subset (box plot shows median, quartiles and outliers). (A) Distribution of RaSP ΔΔG values for benign (blue) and pathogenic (tan) variants extracted from the ClinVar database (Landrum et al., 2018). We observe that the median RaSP ΔΔG value is higher for pathogenic variants compared to benign variants. (B) Distribution of RaSP ΔΔG values for variants with different allele frequencies (AF) extracted from the gnomAD database Karczewski et al., 2020 in the ranges (i) AF > 10-2 (green), (ii) 10-2 > AF > 10-4 (orange), (iii) AF < 10-4 (purple). We observe a gradual shift in the median RaSP ΔΔG going from common variants (AF> 10-2) towards rarer ones (AF< 10-4).

Figure 5—figure supplement 3
Histogram of ΔΔG values from saturation mutagenesis using RaSP on predicted structures of the entire human proteome corresponding to ∼300 million predicted ΔΔG values predicted from 23,391 protein structures.

Tables

Table 1
Overview of RaSP model test set prediction results including benchmark comparison with the Rosetta protocol.

When comparing RaSP to Rosetta (column: "Pearson |ρ| RaSP vs. Ros."), we only compute the Pearson correlation coefficients for variants with a Rosetta ΔΔG value in the range [–1;7] kcal/mol. Experimental data is from Kumar, 2006; Ó Conchúir et al., 2015; Nisthal et al., 2019; Matreyek et al., 2018; Suiter et al., 2020.

Data setProtein namePDB, chainPearson |ρ|
RaSP vs. Ros.
Pearson |ρ|
RaSP vs. Exp.
Pearson |ρ|
Ros. vs. Exp.
RaSP test setMEN13U84, A0.85--
F82R7E, A0.71--
ELANE4WVP, A0.81--
ADSL2J91, A0.84--
GCK4DCH, A0.84--
RPE654RSC, A0.84--
TTR1F41, A0.88--
ELOB4AJY, B0.87--
SOD12CJS, A0.84--
VANX1R44, A0.83--
ProTherm test setMyoglobin1BVC, A0.910.710.76
Lysozyme1LZ1, A0.800.570.65
Chymotrypsin inhib.2CI2, I0.790.650.68
RNAse H2RN2, A0.780.790.71
Protein GProtein G1PGA, A0.900.720.72
MAVE test setNUDT155BON, A0.830.500.54
TPMT2H11, A0.860.480.49
PTEN1D5R, A0.870.520.53
Table 2
Benchmark performance of RaSP versus other structure-based methods on the S669 direct experimental data set (Pancotti et al., 2022).

Results for methods other than RaSP have been copied from Pancotti et al., 2022. We speculate that the higher RMSE and MAE values for Rosetta relative to RaSP are due to missing scaling of Rosetta output onto a scale similar to kcal/mol.

MethodS669, direct
Pearson ρRMSE [kcal/mol]MAE [kcal/mol]
Structure-based
ACDC-NN0.461.491.05
DDGun3D0.431.601.11
PremPS0.411.501.08
RaSP0.391.631.14
ThermoNet0.391.621.17
Rosetta0.392.702.08
Dynamut0.411.601.19
INPS3D0.431.501.07
SDM0.411.671.26
PoPMuSiC0.411.511.09
MAESTRO0.501.441.06
FoldX0.222.301.56
DUET0.411.521.10
I-Mutant3.00.361.521.12
mCSM0.361.541.13
Dynamut20.341.581.15
Table 3
Comparing RaSP predictions from crystal and AlphaFold 2 (AF2) structures.

Pearson correlation coefficients (ρ) between RaSP ΔΔG predictions using either two different crystal structures or a crystal structure and an AlphaFold 2 structure for six test proteins: PRMT5 (X1: 6V0P_A, X2: 4GQB_A), PKM (X1: 6B6U_A, X2: 6NU5_A), FTH1 (X1: 4Y08_A, X2: 4OYN_A), FTL (X1: 5LG8_A, X2: 2FFX_J), PSMA2 (X1: 5LE5_A, X2: 5LE5_O) and GNB1 (X1: 6CRK_B, X2: 5UKL_B). We also divided the analysis into residues with high (pLDDT ≥ 0.9) and medium-low (pLDDT <0.9) pLDDT scores from AlphaFold 2.

ProteinAll [ρ]High AF2 pLDDT [ρ]Medium-Low AF2 pLDDT [ρ]
X1-X2X1-AF2X2-AF2X1-X2X1-AF2X2-AF2X1-X2X1-AF2X2-AF2
PRMT50.930.890.95-0.900.95-0.660.89
PKM0.990.950.95-0.950.95-0.880.89
FTH10.990.970.97-0.970.97-0.920.95
FTL0.970.960.97-0.960.97-0.960.94
PSMA20.990.950.95-0.960.96-0.780.80
GNB10.960.940.94-0.940.94-0.930.89
Table 4
Run-time comparison of RaSP and four other methods for three test proteins ELOB (PDB: 4AJY_B, 107 residues), GCK (PBD: 4DCH_A, 434 residues) and F8 (PDB: 2R7E_A, 693 residues).

The RaSP model is in total 480–1,036 times faster than Rosetta. RaSP, ACDC-NN and ThermoNet computations were performed using a single NVIDIA V100 16 GB GPU machine, while Rosetta and FoldX computations were parallelized and run on a server using 64 2.6 GHz AMD Opteron 6380 CPU cores. The number of ΔΔG computations per mutation was set to 3 for both Rosetta and FoldX. For ThermoNet, we expect that the pre-processing speed can be made comparable to Rosetta via parallelization.

MethodProteinWall-clock time [s]
Pre-processingΔΔGΔΔG/ residue
RaSPELOB7410.4
GCK111730.4
F8202700.4
RosettaELOB67744,324414.2
GCK7,996118,361272.7
F817,211133,178192.2
FoldXELOB7842,237394.7
GCK728309,762713.7
F81,306559,050806.7
ACDC-NNELOB811581.5
GCK1696191.4
F83251,0801.6
ThermoNetELOB80,4428848.3
GCK4,586,5224,2279.7
F811,627,4338,73212.6

Additional files

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Lasse M Blaabjerg
  2. Maher M Kassem
  3. Lydia L Good
  4. Nicolas Jonsson
  5. Matteo Cagiada
  6. Kristoffer E Johansson
  7. Wouter Boomsma
  8. Amelie Stein
  9. Kresten Lindorff-Larsen
(2023)
Rapid protein stability prediction using deep learning representations
eLife 12:e82593.
https://doi.org/10.7554/eLife.82593