Figures and data

Epistatic transformer for jointly modeling fixed-order specific epistasis and nonspecific epistasis.
a. Overall model architecture. The input amino acid token xl is first used to generate position-specific amino acid embeddings 

Epistatic transformer is able to capture the genetic architecture of simulated sequence-function relationships.
Data was simulated for a binary fitness landscape with 13 sites, with specific epistasis among up to 8 sites. åa. Test R2 can be used to estimate cumulative variance components. Bar plot shows the cumulative variance for each order of interaction (proportion of variance due to epistasis up to a given order). Error plot shows the test R2 for additive, pairwise, four-way, and eight-way interaction models using epistatic transformer to 90% of simulated data. Error bars correspond to one standard deviation with 5 random train-test splits. b. Epistatic transformer recapitulates true variance components. Bar plots correspond to variance components(proportion of variance explained by interactions of a given order) for the simulated landscape, the pairwise, 4th-order, and 8th-order models. Note that pairwise model only contains interactions up to second order, and 4th-order model contains interactions among up to 4 positions, while 8th-order model model contains interactions among up to 8 positions with variance components aligning with the ground truth. c. Epistatic transformer model becomes increasingly complex with longer training. The three curves represent the variance component decomposition of the full landscape inferred by the 8th-order model at each training epoch. Blue: variance components for additive and pairwise interactions. Orange: variance components for 3rd-order and 4th-order interactions. Green: variance components for interaction order > 4.

Combinatorial protein mutagenesis datasets used in this paper.

Formulas for calculating the percent epistatic variance for different orders of specific epistatic interactions.



Importance of pairwise and higher-order specific epistasis in 10 experimental protein-sequence function datasets.
For each dataset, pairwise, 4th-order and 8th-order models were fit using epistatic transformer with 1, 2, and 3 layers of epistatic MHA, along with an additive model. All models contain a final sigmoid activation function mapping a scalar value to the measurement scale for modeling non-specific epistasis. Models were fit to 80% of training data generated by randomly sampling all available data and evaluated on random test genotypes. In each panel, the number 




Improvement in prediction accuracy in higher-order models is due to their abilities to fit specific epistasis.
Scatter plots show the observed phenotypes (y) vs. latent model predictions (ϕ), or the final model predictions 

Importance of higher-order epistasis in predicting phenotypes for distant genotypes in the AAV2-Capsid (a) and the cgreGFP (b) datasets.
For each dataset, genotypes are binned to discrete distance classes by their mean Hamming distances to the training data comprising 20% of randomly sampled genotypes. For both datasets, we retain only distance classes where the additive model has a test R2 > 0.3. Top panel: Distribution of mean Hamming distance in the randomly sampled test data. Second panel: Distribution of the observed phenotypic values (y) for each distance class. Third panel: Test R2 under the additive model and the 8th-order epistatic transformer for genotypes at different distance classes. The gap between the two curves is equal to 


Higher-order epistasis in a multi-peak fitness landscape, consisted of four green fluorescent protein (GFP) orthologs (avGFP, amacGFP, ppluGFP2, cgreGFP).
a. PCA coordinates for the one-hot embeddings of the all protein genotypes. Genotypes are highly concentrated around the four wild types (WT), which exhibit varying degrees of sequence divergence. b. Scatter plots of shared mutational effects among the four GFPs, fit using separate additive model with a shared sigmoid activation function. c. Higher-order epistasis allows better generalization to distant regions in sequence space. Models were fit use single and double mutant data for all GFPs. Models were tested in different distance classes, each containing all genotypes at given Hamming distance to their corresponding WT sequence. Error bars represent 1 standard deviation calculated by resampling random 90% of the test genotypes with 10 replicates. d. Higher-order epistasis allows better generalization across local peaks. Rows: GFP orthologs used to train the models. Columns: GFP orthologs used to test performance of the trained model. 

Genetic architecture for epistasis of GRB2-SH3 abundance, inferred using a 3-layer epistatic transformer model fit to the ‘GRB-abundance’ dataset.
a. Position-specific additive and marginal epistatic effects. Additive effects were calculated as the fraction of total additive variance explained by each position. Marginal epistatic effects of order m (m = 2, 3, 4) for position i were calculated as the fraction of total epistatic variance of order m explained by all m-way interactions involving site i. Effects were normalized to sum to 1 for row to assist in visualization of the results. Results were derived by averaging over 5 replicates over random subsamples of all available data. b. Marginal three-way epistatic effects mapped to the 3D-structure of GRB2-SH3 (pdb: 1GCQA). Gray corresponds to unmutagenized positions in the experiment. c. Sparsity of epistasis. Each panel shows the fraction of total pairwise, three-way, or four-way variance explained by the top percentage of interacting sites. For example, the leftmost panel shows the top 50% all pairs of sites explain nearly all of the variance due to pairwise epistasis.