An illustration of ProtSSN that extracts the semantical and geometrical characteristics of a protein from its sequentially-ordered global construction and spatially-gathered local contacts with protein language models and equivariant graph neural networks. The encoded hidden representation can be used for downstream tasks such as variants effect prediction that recognizes the impact of mutating a few sites of a protein on its functionality.

Statistical summary of DTm and DDG.

An example source record of the mutation assay. The record is derived from DDG for the A chain of protein 1A7V, experimented at pH=6.5 and degree at 25°C.

Details of Baseline Models.

Spearman’s ρ correlation of variant effect prediction on DTm, DDG, and ProteinGym v1.

Number of trainable parameters and Spearman’s ρ correlation on DTm, DDG, and ProteinGym v1, with the medium value located by the dashed lines. Dot, cross, and diamond markers represent sequence-based, structure-based, and sequence-structure models, respectively.

Ablation Study on ProteinGym v0 with different modular settings of ProtSSN. Each record represents the average Spearman’s correlation of all assays.

Influence of folding strategies (AlphaFold2 and ESMFold) on prediction performance for structure-involved models.

Variant Effect Prediction on ProteinGym v0 with both MSA and non-MSA methods. Results are retrieved from (Notin et al., 2022).

Spearman’s ρ correlation of variant effect prediction on DTm and DDG for non-MSA methods with model ensemble.