An illustration of ProtSSN that extracts the semantical and geometrical characteristics of a protein from its sequentially-ordered global construction and spatially-gathered local contacts with protein language models and equivariant graph neural networks. The encoded hidden representation can be used for downstream tasks such as variants effect prediction that recognizes the impact of mutating a few sites of a protein on its functionality.

Statistical summary of DTm and DDG.

An example source record of the mutation assay. The record is derived from DDG for the A chain of protein 1A7V, experimented at pH=6.5 and degree at 25°C.

Details of Zero-shot Baseline Models.

Spearman’s ρ correlation of variant effect prediction by with zero-shot methods on DTm, DDG, and ProteinGym v1.

Number of trainable parameters and Spearman’s ρ correlation on DTm, DDG, and ProteinGym v1, with the medium value located by the dashed lines. Dot, cross, and diamond markers represent sequence-based, structure-based, and sequence-structure models, respectively.

Ablation Study on ProteinGym v0 with different modular settings of ProtSSN. Each record represents the average Spearman’s correlation of all assays. (a) Performance using different structure encoders: EGNN (orange) versus GCN/GAT (purple); (b) Performance using different node attributes: ESM2-embedded hidden representation (orange) versus one-hot encoding (purple); (c) Performance with varying numbers of EGNN layers; (d) Performance with different versions of ESM2 for sequence encoding; (e) Performance using different amino acid perturbation strategies during pre-training.

Influence of folding strategies (AlphaFold2 and ESMFold) on prediction performance for structure-involved models.

Variant Effect Prediction on ProteinGym v0 with both zero-shot and few-shot methods. Results are retrieved from (Notin et al., 2022a).

Average Spearman’s ρ correlation of variant effect prediction on DTm and DDG for zero-shot methods with model ensemble. The values within () indicate the standard deviation of bootstrapping.