Figures and data in OpenSpliceAI provides an efficient modular implementation of SpliceAI enabling easy retraining across nonhuman species

Figures
Tables
Additional files

6 figures, 2 tables and 1 additional file

Figures

Figure 1 with 3 supplements

Download asset Open asset

Overview of the OpenSpliceAI design.

This toolkit features six primary subcommands: (A) The ‘create-data’ subcommand processes genome annotations in GFF/GTF format and genome sequences in FASTA format to produce one-hot encoded gene sequences (X) and corresponding labels (Y), both stored in HDF5 format. (B) The ‘train’ subcommand utilizes the HDF5 files generated by ‘create-data’ to train the SpliceAI model using PyTorch, resulting in a serialized model in PT format. This process also generates logs for training, testing, and validation. (C) The ‘calibrate’ subcommand takes both training and test datasets along with a pre-trained model in PT format. It randomly allocates 10% of the training data as a validation (calibration) set, which is then used to adjust the model’s output probabilities so that they more accurately reflect the observed empirical probabilities during evaluation on the test set. (D) The ‘transfer’ subcommand allows for model customization using a dataset from a different species, requiring a pre-trained model in PT format and HDF5 files for transfer learning and testing. (E) The ‘predict’ subcommand enables users to predict splice site probabilities for sequences in given FASTA files. (F) The ‘variant’ subcommand assesses the impact of potential SNPs and indels on splice sites using VCF format files, providing predicted cryptic splice sites.

Figure 1—figure supplement 1

Download asset Open asset

Overview of the OpenSpliceAI architectures trained with different flanking sequence lengths.

(A) OpenSpliceAI-10k: schematic of the model configured for 10 kb flanking regions. The input sequence is passed through an initial convolution layer (Conv1D), followed by 16 residual units, each incorporating skip connections. The final output is fed into a softmax function for splice site classification. (B) OpenSpliceAI-2k: model variant with 2 kb flanking sequences, using a similar structure with 12 repeated residual units. (C) OpenSpliceAI-400: Model variant with 400 bp flanking sequences, using a similar structure with 8 repeated residual units. (D) OpenSpliceAI-80: the smallest variant, trained on 80 bp flanking sequences, using a similar structure with 4 repeated residual units. (E) Detailed view of the residual unit (RU) structure, highlighting the convolution, batch normalization, and skip connections. (F) Training setup: all models were trained using the AdamW optimizer, either a MultiStepLR or CosineAnnealingWarmRestarts learning rate scheduler, and cross-entropy or focal loss functions.

Figure 1—figure supplement 2

Download asset Open asset

Decision-making and workflow of the predict subcommand.

Required inputs include the FASTA file and PyTorch model, while optional inputs include a GFF annotation file and custom parameters. The outputs are two BED files corresponding to predicted donor and acceptor splice sites. Several intermediate files may be generated and useful to the user, including the HDF5-compressed data file (raw sequences) and dataset (encoded inputs) for training.

Figure 1—figure supplement 3

Download asset Open asset

Decision-making and workflow of the variant subcommand.

Required inputs include the VCF file, FASTA file, and annotation file, while optional inputs include the output path, splicing model, and distance, precision, and masking parameters. The output is a VCF file with the OpenSpliceAI delta scores of the variants. Note that both Keras and PyTorch models are supported in this tool, but PyTorch is recommended.

Figure 2 with 6 supplements

Download asset Open asset

Overview of OpenSpliceAI framework, performance benchmarking, and comparison with SpliceAI-Keras.

(A) Schematic overview of OpenSpliceAI’s approach. Gene sequences are first extracted from the genome FASTA file and one-hot encoded (X). Splice sites are identified and labeled using the annotation file (Y). The resulting paired data (X, Y) for each gene is then compiled for model training (80% of the sequences) and testing (20% of the sequences). (B) Workflow of the OSAI_MANE 10,000 model. Input sequences are one-hot encoded and padded with 5000 Ns ([0,0,0,0]) on each side, totaling 10,000 Ns. The model processes the input and outputs, for each position, the probability of that position being a donor site, an acceptor site, or neither. (**C–D**) Performance comparison between OSAI_MANE and SpliceAI-Keras on splicing donor and acceptor sites, trained with 80 nt, 400 nt, 2000 nt, and 10,000 nt flanking sequences. Evaluation metrics include top-1 accuracy for both donor and acceptor sites. Blue curves represent SpliceAI-Keras, while orange curves represent OSAI_MANE. Each dot is the mean over five trained model variants, and error bars show ±1 standard error of the mean. Performance is compared across test datasets from humans. (E) Benchmarking results for elapsed time, average memory usage, and GPU peak memory for the prediction submodule.

Figure 2—figure supplement 1

Download asset Open asset

Comparison of splice site prediction performance between SpliceAI-Keras (blue) and OSAI_MANE (orange) across human (*Homo sapiens*) datasets with varying flanking sequence lengths.

The plots display donor (5′) and acceptor (3′) splice site prediction metrics using 80, 400, 2000, and 10,000 nt of flanking context. OSAI_MANE was trained using the RefSeq MANE v1.3 database and GRCh38.p14 genome. (A) Donor top-1: measures the percentage of times the model’s most confident prediction exactly matches the true donor site label. (B) Donor AUPRC: area under the precision-recall curve for donor site predictions. (C) Donor accuracy: proportion of correct donor site calls among all predictions. (D) Donor precision: the fraction of predicted donor sites that are correct. (E) Donor recall: the fraction of true donor sites that are correctly predicted. (F) Donor F1: the harmonic mean of precision and recall for donor sites. (G) Acceptor top-1: measures the percentage of times the model’s most confident prediction exactly matches the true acceptor site label. (H) Acceptor AUPRC: area under the precision-recall curve for acceptor site predictions. (I) Acceptor accuracy: proportion of correct acceptor site calls among all predictions. (J) Acceptor precision: the fraction of predicted acceptor sites that are correct. (K) Acceptor recall: the fraction of true acceptor sites that are correctly predicted. (L) Acceptor F1: the harmonic mean of precision and recall for acceptor sites. Each panel illustrates that increasing the flanking sequence length generally enhances model performance, with both SpliceAI-Keras and OpenSpliceAI achieving high accuracy and F1 scores at 10,000 nt. Each data point represents the mean across five independently trained models, with error bars indicating the standard error.

Figure 2—figure supplement 2

Download asset Open asset

Comparison of splice site prediction performance between SpliceAI-Keras (blue) and OSAI_MANE (orange) across house mouse (*Mus musculus*) datasets with varying flanking sequence lengths.

The plots display donor (5′) and acceptor (3′) splice site prediction metrics using 80, 400, 2000, and 10,000 nt of flanking context. OSAI_MANE was trained using the RefSeq MANE v1.3 database and GRCh38.p14 genome. The house mouse datasets are curated from RefSeq GRCm39 annotation and GRCm39 genome. (A) Donor top-1: measures the percentage of times the model’s most confident prediction exactly matches the true donor site label. (B) Donor AUPRC: area under the precision-recall curve for donor site predictions. (C) Donor accuracy: proportion of correct donor site calls among all predictions. (D) Donor precision: the fraction of predicted donor sites that are correct. (E) Donor recall: the fraction of true donor sites that are correctly predicted. (F) Donor F1: the harmonic mean of precision and recall for donor sites. (G) Acceptor top-1: measures the percentage of times the model’s most confident prediction exactly matches the true acceptor site label. (H) Acceptor AUPRC: area under the precision-recall curve for acceptor site predictions. (I) Acceptor accuracy: proportion of correct acceptor site calls among all predictions. (J) Acceptor precision: the fraction of predicted acceptor sites that are correct. (K) Acceptor recall: the fraction of true acceptor sites that are correctly predicted. (L) Acceptor F1: the harmonic mean of precision and recall for acceptor sites. Each panel illustrates that increasing the flanking sequence length generally enhances model performance, with both SpliceAI-Keras and OpenSpliceAI achieving high accuracy and F1 scores at 10,000 nt. Each data point represents the mean across five independently trained models, with error bars indicating the standard error.

Figure 2—figure supplement 3

Download asset Open asset

Comparison of splice site prediction performance between SpliceAI-Keras (blue) and OSAI_MANE (orange) across honeybee (*Apis mellifera*) datasets with varying flanking sequence lengths.

The plots display donor (5′) and acceptor (3′) splice site prediction metrics using 80, 400, 2000, and 10,000 nt of flanking context. OSAI_MANE was trained using the RefSeq MANE v1.3 database and GRCh38.p14 genome. The honeybee datasets are curated from RefSeq Amel_HAv3.1 annotation and Amel_HAv3.1 genome. (A) Donor top-1: measures the percentage of times the model’s most confident prediction exactly matches the true donor site label. (B) Donor AUPRC: area under the precision-recall curve for donor site predictions. (C) Donor accuracy: proportion of correct donor site calls among all predictions. (D) Donor precision: the fraction of predicted donor sites that are correct. (E) Donor recall: the fraction of true donor sites that are correctly predicted. (F) Donor F1: the harmonic mean of precision and recall for donor sites. (G) Acceptor top-1: measures the percentage of times the model’s most confident prediction exactly matches the true acceptor site label. (H) Acceptor AUPRC: area under the precision-recall curve for acceptor site predictions. (I) Acceptor accuracy: proportion of correct acceptor site calls among all predictions. (J) Acceptor precision: the fraction of predicted acceptor sites that are correct. (K) Acceptor recall: the fraction of true acceptor sites that are correctly predicted. (L) Acceptor F1: the harmonic mean of precision and recall for acceptor sites. Each panel illustrates that increasing the flanking sequence length generally enhances model performance, with both SpliceAI-Keras and OpenSpliceAI achieving high accuracy and F1 scores at 10,000 nt. Each data point represents the mean across five independently trained models, with error bars indicating the standard error.

Figure 2—figure supplement 4

Download asset Open asset

Comparison of splice site prediction performance between SpliceAI-Keras (blue) and OSAI_MANE (orange) across zebrafish (*Danio rerio*) datasets with varying flanking sequence lengths.

The plots display donor (5′) and acceptor (3′) splice site prediction metrics using 80, 400, 2000, and 10,000 nt of flanking context. OSAI_MANE was trained using the RefSeq MANE v1.3 database and GRCh38.p14 genome. The zebrafish datasets are curated from RefSeq GRCz11 annotation and GRCz11 genome. (A) Donor top-1: measures the percentage of times the model’s most confident prediction exactly matches the true donor site label. (B) Donor AUPRC: area under the precision-recall curve for donor site predictions. (C) Donor accuracy: proportion of correct donor site calls among all predictions. (D) Donor precision: the fraction of predicted donor sites that are correct. (E) Donor recall: the fraction of true donor sites that are correctly predicted. (F) Donor F1: the harmonic mean of precision and recall for donor sites. (G) Acceptor top-1: measures the percentage of times the model’s most confident prediction exactly matches the true acceptor site label. (H) Acceptor AUPRC: area under the precision-recall curve for acceptor site predictions. (I) Acceptor accuracy: proportion of correct acceptor site calls among all predictions. (J) Acceptor precision: the fraction of predicted acceptor sites that are correct. (K) Acceptor recall: the fraction of true acceptor sites that are correctly predicted. (L) Acceptor F1: the harmonic mean of precision and recall for acceptor sites. Each panel illustrates that increasing the flanking sequence length generally enhances model performance, with both SpliceAI-Keras and OpenSpliceAI achieving high accuracy and F1 scores at 10,000 nt. Each data point represents the mean across five independently trained models, with error bars indicating the standard error.

Figure 2—figure supplement 5

Download asset Open asset

Comparison of splice site prediction performance between SpliceAI-Keras (blue) and OSAI_MANE (orange) across *Arabidopsis thaliana* datasets with varying flanking sequence lengths.

The plots display donor (5′) and acceptor (3′) splice site prediction metrics using 80, 400, 2000, and 10,000 nt of flanking context. OSAI_MANE was trained using the RefSeq MANE v1.3 database and GRCh38.p14 genome. The *A. thaliana* datasets are curated from RefSeq TAIR10.1 annotation and TAIR10.1 genome. (A) Donor top-1: measures the percentage of times the model’s most confident prediction exactly matches the true donor site label. (B) Donor AUPRC: area under the precision-recall curve for donor site predictions. (C) Donor accuracy: proportion of correct donor site calls among all predictions. (D) Donor precision: the fraction of predicted donor sites that are correct. (E) Donor recall: the fraction of true donor sites that are correctly predicted. (F) Donor F1: the harmonic mean of precision and recall for donor sites. (G) Acceptor top-1: measures the percentage of times the model’s most confident prediction exactly matches the true acceptor site label. (H) Acceptor AUPRC: area under the precision-recall curve for acceptor site predictions. (I) Acceptor accuracy: proportion of correct acceptor site calls among all predictions. (J) Acceptor precision: the fraction of predicted acceptor sites that are correct. (K) Acceptor recall: the fraction of true acceptor sites that are correctly predicted. (L) Acceptor F1: the harmonic mean of precision and recall for acceptor sites. Each panel illustrates that increasing the flanking sequence length generally enhances model performance, with both SpliceAI-Keras and OpenSpliceAI achieving high accuracy and F1 scores at 10,000 nt. Each data point represents the mean across five independently trained models, with error bars indicating the standard error.

Figure 2—figure supplement 6

Download asset Open asset

Comparison of runtime and memory metrics for ‘predict’ (panels A–F) and ‘variant’ (panels G–L) in OSAI_MANE models with different flanking sequences.

Each corresponding pair of panels displays the same metric for the two methods as a function of increasing input size. (**A, G**) Overall elapsed time: total elapsed CPU time to complete processing. (**B, H**) Average memory usage: mean CPU memory consumption (in MB) during execution, reflecting the typical memory footprint. (**C, I**) Peak memory usage: maximum CPU memory recorded (in MB) at any point. (**D, J**) Peak GPU memory: maximum GPU memory recorded (in MB) at any point. (**E, K**) Memory growth rate: the average rate of memory increase during runtime, which indicates how the constant of memory usage increases with larger inputs. (**F, L**) CPU utilization profile: percentage of time spent in native C execution (as opposed to interpreted Python code), reflecting the runtime that is being used by compiled, low-level routines.

Figure 3 with 5 supplements

Download asset Open asset

Cross-species dataset composition, sequence filtering, and performance comparison of OpenSpliceAI with SpliceAI-Keras.

(A) The number of protein-coding genes in the training and test sets, along with the count of paralogous genes removed for each species: Human MANE, mouse, zebrafish, honeybee, and *Arabidopsis*. (B) Scatter plots of DNA sequence alignments between testing and training sets for human MANE, mouse, honeybee, zebrafish, and *Arabidopsis*. Each dot represents an alignment, with the x-axis showing alignment identity and the y-axis showing alignment coverage. Alignments exceeding 80% for both identity and coverage are highlighted in the red-shaded region and excluded from the test sets. (**C–F**) Performance comparisons of OSAIs trained on species-specific datasets (mouse, zebrafish, honeybee, and *Arabidopsis*) vs. SpliceAI-Keras, original published SpliceAI models, trained on human data. The orange curves represent OSAI metrics, while the blue curves show SpliceAI-Keras metrics. Each subplot (**C–F**) includes F1 score evaluated separately for donor and acceptor sites. Each data point represents the mean across five independently trained models, with error bars indicating the standard error.

Figure 3—figure supplement 1

Download asset Open asset

Splice site motif count and intron length distributions across five species.

(A) Canonical vs. non-canonical donor and acceptor splice sites. Bar plots depict the total number of donor (blue, orange) and acceptor (green, red) sites across human (MANE), mouse, zebrafish, honeybee, and *Arabidopsis* genomes, subdivided into canonical (GT/AG) and noncanonical motifs. Percentages above each bar indicate the proportion of sites using canonical motifs in each species. (B) Splice junction (intron) length distributions. Kernel density curves illustrate the distribution of intron lengths in each genome (colored as in the legend). Differences in the breadth and peak of each distribution highlight notable cross-species variation in intron size. (C) Scatter plots of DNA sequence alignments between validation and training sets for human MANE, mouse, honeybee, zebrafish, and *Arabidopsis*. Each dot represents an alignment, with the x-axis showing alignment identity and the y-axis showing alignment coverage. Alignments exceeding 80% for both identity and coverage are highlighted in the red-shaded region and excluded from the test sets.

Figure 3—figure supplement 2

Download asset Open asset

Splice site prediction metrics for the mouse (*M. musculus*) across varying flanking sequence lengths.

Plots compare performance of SpliceAI-Keras (blue) and OSAI_Mouse (orange) on donor (5′) and acceptor (3′) splice site predictions using 80, 400, 2000, and 10,000 nt of flanking context. OSAI_Mouse was trained with RefSeq GRCm39 annotation and GRCm39 genome. (A) Donor top-1: measures the percentage of times the model’s most confident prediction exactly matches the true donor site label. (B) Donor AUPRC: area under the precision-recall curve for donor site predictions. (C) Donor accuracy: proportion of correct donor site calls among all predictions. (D) Donor precision: the fraction of predicted donor sites that are correct. (E) Donor recall: the fraction of true donor sites that are correctly predicted. (F) Donor F1: the harmonic mean of precision and recall for donor sites. (G) Acceptor top-1: measures the percentage of times the model’s most confident prediction exactly matches the true acceptor site label. (H) Acceptor AUPRC: area under the precision-recall curve for acceptor site predictions. (I) Acceptor accuracy: proportion of correct acceptor site calls among all predictions. (J) Acceptor precision: the fraction of predicted acceptor sites that are correct. (K) Acceptor recall: the fraction of true acceptor sites that are correctly predicted. (L) Acceptor F1: the harmonic mean of precision and recall for acceptor sites. Each panel illustrates that increasing the flanking sequence length generally enhances model performance, with both SpliceAI-Keras and OpenSpliceAI achieving high accuracy and F1 scores at 10,000 nt. Each data point represents the mean across five independently trained models, with error bars indicating the standard error.

Figure 3—figure supplement 3

Download asset Open asset

Splice site prediction metrics for the honeybee (*A. mellifera*) across varying flanking sequence lengths.

Plots compare performance of SpliceAI-Keras (blue) and OSAI_Honeybee (orange) on donor (5′) and acceptor (3′) splice site predictions using 80, 400, 2000, and 10,000 nt of flanking context. OSAI_Honeybee was trained with RefSeq Amel_HAv3.1 annotation and Amel_HAv3.1 genome. (A) Donor top-1: measures the percentage of times the model’s most confident prediction exactly matches the true donor site label. (B) Donor AUPRC: area under the precision-recall curve for donor site predictions. (C) Donor accuracy: proportion of correct donor site calls among all predictions. (D) Donor precision: the fraction of predicted donor sites that are correct. (E) Donor recall: the fraction of true donor sites that are correctly predicted. (F) Donor F1: the harmonic mean of precision and recall for donor sites. (G) Acceptor top-1: measures the percentage of times the model’s most confident prediction exactly matches the true acceptor site label. (H) Acceptor AUPRC: area under the precision-recall curve for acceptor site predictions. (I) Acceptor accuracy: proportion of correct acceptor site calls among all predictions. (J) Acceptor precision: the fraction of predicted acceptor sites that are correct. (K) Acceptor recall: the fraction of true acceptor sites that are correctly predicted. (L) Acceptor F1: the harmonic mean of precision and recall for acceptor sites. Each panel illustrates that increasing the flanking sequence length generally enhances model performance, with both SpliceAI-Keras and OpenSpliceAI achieving high accuracy and F1 scores at 10,000 nt. Each data point represents the mean across five independently trained models, with error bars indicating the standard error.

Figure 3—figure supplement 4

Download asset Open asset

Splice site prediction metrics for the zebrafish (*D. rerio*) across varying flanking sequence lengths.

Plots compare performance of SpliceAI-Keras (blue) and OSAI_Zebrafish (orange) on donor (5′) and acceptor (3′) splice site predictions using 80, 400, 2000, and 10,000 nt of flanking context. OSAI_Zebrafish was trained with RefSeq GRCz11 annotation and GRCz11 genome. (A) Donor top-1: measures the percentage of times the model’s most confident prediction exactly matches the true donor site label. (B) Donor AUPRC: area under the precision-recall curve for donor site predictions. (C) Donor accuracy: proportion of correct donor site calls among all predictions. (D) Donor precision: the fraction of predicted donor sites that are correct. (E) Donor recall: the fraction of true donor sites that are correctly predicted. (F) Donor F1: the harmonic mean of precision and recall for donor sites. (G) Acceptor top-1: measures the percentage of times the model’s most confident prediction exactly matches the true acceptor site label. (H) Acceptor AUPRC: area under the precision-recall curve for acceptor site predictions. (I) Acceptor accuracy: proportion of correct acceptor site calls among all predictions. (J) Acceptor precision: the fraction of predicted acceptor sites that are correct. (K) Acceptor recall: the fraction of true acceptor sites that are correctly predicted. (L) Acceptor F1: the harmonic mean of precision and recall for acceptor sites. Each panel illustrates that increasing the flanking sequence length generally enhances model performance, with both SpliceAI-Keras and OpenSpliceAI achieving high accuracy and F1 scores at 10,000 nt. Each data point represents the mean across five independently trained models, with error bars indicating the standard error.

Figure 3—figure supplement 5

Download asset Open asset

Splice site prediction metrics for the *A. thaliana* across varying flanking sequence lengths.

Plots compare performance of SpliceAI-Keras (blue) and OSAI_Arabidopsis (orange) on donor (5′) and acceptor (3′) splice site predictions using 80, 400, 2000, and 10,000 nt of flanking context. OSAI_Arabidopsis was trained with RefSeq TAIR10.1 annotation and TAIR10.1 genome. (A) Donor top-1: measures the percentage of times the model’s most confident prediction exactly matches the true donor site label. (B) Donor AUPRC: area under the precision-recall curve for donor site predictions. (C) Donor accuracy: proportion of correct donor site calls among all predictions. (D) Donor precision: the fraction of predicted donor sites that are correct. (E) Donor recall: the fraction of true donor sites that are correctly predicted. (F) Donor F1: the harmonic mean of precision and recall for donor sites. (G) Acceptor top-1: measures the percentage of times the model’s most confident prediction exactly matches the true acceptor site label. (H) Acceptor AUPRC: area under the precision-recall curve for acceptor site predictions. (I) Acceptor accuracy: proportion of correct acceptor site calls among all predictions. (J) Acceptor precision: the fraction of predicted acceptor sites that are correct. (K) Acceptor recall: the fraction of true acceptor sites that are correctly predicted. (L) Acceptor F1: the harmonic mean of precision and recall for acceptor sites. Each panel illustrates that increasing the flanking sequence length generally enhances model performance, with both SpliceAI-Keras and OpenSpliceAI achieving high accuracy and F1 scores at 10,000 nt. Each data point represents the mean across five independently trained models, with error bars indicating the standard error.

Figure 4 with 5 supplements

Download asset Open asset

Performance comparison of scratch-trained and transfer-trained OSAIs across species and sequence lengths.

(**A–D**) Top-1 accuracy for donor and acceptor splice sites of 80 nt, 400 nt, 2000 nt, and 10,000 nt models, comparing OSAI_Mouse (scratch-trained) and OSAI_Mouse-transferred (transfer-trained) models over epochs 1–10 on the test dataset. (**E–H**) Top-1 accuracy after one epoch of training vs. after 10 epochs for both scratch-trained and transfer-trained models across the same sequence lengths. Each plot represents one species and its corresponding transfer-trained model: (E) OSAI_Mouse vs. OSAI_Mouse-transferred, (F) OSAI_Zebrafish vs. OSAI_Zebrafish-transferred, (G) OSAI_Arabidopsis vs. OSAI_Arabidopsis-transferred, and (H) OSAI_Honeybee vs. OSAI_Honeybee-transferred. Each data point represents the mean across five independently trained models, with error bars indicating the standard error.

Figure 4—figure supplement 1

Download asset Open asset

Transfer learning from OSAI_MANE was leveraged to evaluate performance metrics for splice site prediction in mouse (*M. musculus*) across four flanking sequence lengths (80, 400, 2000, and 10,000 nt) and two splice site types (acceptor and donor).

Each row corresponds to a unique flanking length-splice site combination, and each column depicts a distinct evaluation metric (top-1 accuracy, F1, and AUPRC). Blue and red curves represent the mean performance (± standard deviation) of five models trained from scratch and five models fine-tuned from OSAI_MANE, respectively. The x-axis indicates the training epoch, and the y-axis denotes the corresponding metric value. Subplot titles specify the flanking length, splice site type, and metric. Overall, fine-tuned models converge more rapidly, reaching stable performance as early as the first epoch, whereas models trained from scratch require additional epochs to stabilize.

Figure 4—figure supplement 2

Download asset Open asset

Transfer learning from OSAI_MANE was leveraged to evaluate performance metrics for splice site prediction in honeybee (*A. mellifera*) across four flanking sequence lengths (80, 400, 2000, and 10,000 nt) and two splice site types (acceptor and donor).

Each row corresponds to a unique flanking length-splice site combination, and each column depicts a distinct evaluation metric (top-1 accuracy, F1, and AUPRC). Blue and red curves represent the mean performance (± standard deviation) of five models trained from scratch and five models fine-tuned from OSAI_MANE, respectively. The x-axis indicates the training epoch, and the y-axis denotes the corresponding metric value. Subplot titles specify the flanking length, splice site type, and metric. Overall, fine-tuned models converge more rapidly, reaching stable performance as early as the first epoch, whereas models trained from scratch require additional epochs to stabilize.

Figure 4—figure supplement 3

Download asset Open asset

Transfer learning from OSAI_MANE was leveraged to evaluate performance metrics for splice site prediction in zebrafish (*D. rerio*) across four flanking sequence lengths (80, 400, 2000, and 10,000 nt) and two splice site types (acceptor and donor).

Each row corresponds to a unique flanking length-splice site combination, and each column depicts a distinct evaluation metric (top-1 accuracy, F1, and AUPRC). Blue and red curves represent the mean performance (± standard deviation) of five models trained from scratch and five models fine-tuned from OSAI_MANE, respectively. The x-axis indicates the training epoch, and the y-axis denotes the corresponding metric value. Subplot titles specify the flanking length, splice site type, and metric. Overall, fine-tuned models converge more rapidly, reaching stable performance as early as the first epoch, whereas models trained from scratch require additional epochs to stabilize.

Figure 4—figure supplement 4

Download asset Open asset

Transfer learning from OSAI_MANE was leveraged to evaluate performance metrics for splice site prediction in *A. thaliana* across four flanking sequence lengths (80, 400, 2000, and 10,000 nt) and two splice site types (acceptor and donor).

Each row corresponds to a unique flanking length-splice site combination, and each column depicts a distinct evaluation metric (top-1 accuracy, F1, and AUPRC). Blue and red curves represent the mean performance (± standard deviation) of five models trained from scratch and five models fine-tuned from OSAI_MANE, respectively. The x-axis indicates the training epoch, and the y-axis denotes the corresponding metric value. Subplot titles specify the flanking length, splice site type, and metric. Overall, fine-tuned models converge more rapidly, reaching stable performance as early as the first epoch, whereas models trained from scratch require additional epochs to stabilize.

Figure 4—figure supplement 5

Download asset Open asset

Cross‐species transfer learning performance on human splice‐site prediction.

Models pre-trained on (A) *A. thaliana* (OSAI_Arabidopsis), (B) *A. mellifera* (OSAI_Honeybee), (C) *M. musculus* (OSAI_Mouse), and (D) *D. rerio* (OSAI_Zebrafish) were fine‐tuned on the human MANE dataset (orange) and compared to a model trained from scratch on MANE (OSAI_MANE, blue). Within each panel, the top row shows donor top-K (left) and donor F1 (right) scores, and the bottom row shows acceptor top-K (left) and acceptor F1 (right) scores. The x-axis indicates models trained with flanking sequence sizes of 80, 400, 2000, and 10,000 bp. Each data point represents the mean across five independently trained models, with error bars indicating the standard error.

Figure 5 with 7 supplements

Download asset Open asset

Calibration of OSAI_MANE predictions across splice-site classes and flanking sequence lengths.

(A) Calibration results for OSAI_MANE across non-splice sites, acceptor sites, and donor sites. Models trained with different flanking sequence lengths are represented by color: 80 nt (blue), 400 nt (green), 2000 nt (orange), and 10,000 nt (red). Dotted curves in lighter colors denote pre-calibration results, while solid curves in darker shades show post-calibration results. (B) Expected calibration error (ECE) on the validation set (top) and test set (bottom), comparing the OSAI_MANE’s performance before (blue bars) and after (orange bars) calibration. For each flanking sequence OSAI_MANE, five calibration experiments were performed, with the mean loss and ±1 standard error. (C) Two-dimensional calibration map for OSAI_MANE, illustrating how raw predicted probabilities for acceptor (x-axis) and donor (y-axis) sites are transformed after calibration. Arrows indicate the shift from pre- to post-calibration states in two-dimensional probability space, resulting in a smoother probability distribution.

Figure 5—figure supplement 1

Download asset Open asset

Calibration results for human MANE splice site classification at four flanking sequence sizes.

(**A, C, E, G**) Reliability (calibration) curves for flanking sequence sizes of 80, 400, 2000, and 10,000 nucleotides, respectively. Each plot compares predicted probabilities (x-axis) to empirical probabilities (y-axis) for non-splice sites (left), acceptor sites (middle), and donor sites (right). The blue curves depict the reliability of the original OSAI_MANE models, while the green curves show reliability after calibration. Shaded regions represent confidence intervals. The diagonal black line indicates perfect calibration, where predicted probabilities match observed frequencies exactly. (**B, D, F, H**) Temperature scaling maps for each corresponding flanking sequence size, illustrating how raw predicted probabilities for acceptor (x-axis) and donor (y-axis) sites are transformed after calibration. Arrows indicate the shift from pre- to post-calibration states in two-dimensional probability space.

Figure 5—figure supplement 2

Download asset Open asset

Expected calibration error (ECE) on the validation (top) and test (bottom) sets.

Blue bars indicate model performance before calibration, and orange bars indicate performance after calibration. Results are shown for (A) human MANE, (B) mouse, (C) honeybee, (D) *Arabidopsis*, and (E) zebrafish.

Figure 5—figure supplement 3

Download asset Open asset

Negative log likelihood (NLL) loss on the validation (top) and test (bottom) sets.

Blue bars indicate model performance before calibration, and orange bars indicate performance after calibration. Results are shown for (A) human MANE, (B) mouse, (C) honeybee, (D) *Arabidopsis*, and (E) zebrafish.

Figure 5—figure supplement 4

Download asset Open asset

Calibration results for house mouse (*M. musculus*) splice site classification at four flanking sequence sizes.

(**A, C, E, G**) Reliability (calibration) curves for flanking sequence sizes of 80, 400, 2000, and 10,000 nt, respectively. Each plot compares predicted probabilities (x-axis) to empirical probabilities (y-axis) for non-splice sites (left), acceptor sites (middle), and donor sites (right). The blue curves depict the reliability of the original OSAI_Mouse models, while the green curves show reliability after calibration. Shaded regions represent confidence intervals. The diagonal black line indicates perfect calibration, where predicted probabilities match observed frequencies exactly. (**B, D, F, H**) Temperature scaling maps for each corresponding flanking sequence size, illustrating how raw predicted probabilities for acceptor (x-axis) and donor (y-axis) sites are transformed after calibration. Arrows indicate the shift from pre- to post-calibration states in two-dimensional probability space.

Figure 5—figure supplement 5

Download asset Open asset

Calibration results for zebrafish (*D. rerio*) splice site classification at four flanking sequence sizes.

(**A, C, E, G**) Reliability (calibration) curves for flanking sequence sizes of 80, 400, 2000, and 10,000 nt, respectively. Each plot compares predicted probabilities (x-axis) to empirical probabilities (y-axis) for non-splice sites (left), acceptor sites (middle), and donor sites (right). The blue curves depict the reliability of the original OSAI_Zebrafish models, while the green curves show reliability after calibration. Shaded regions represent confidence intervals. The diagonal black line indicates perfect calibration, where predicted probabilities match observed frequencies exactly. (**B, D, F, H**) Temperature scaling maps for each corresponding flanking sequence size, illustrating how raw predicted probabilities for acceptor (x-axis) and donor (y-axis) sites are transformed after calibration. Arrows indicate the shift from pre- to post-calibration states in two-dimensional probability space.

Figure 5—figure supplement 6

Download asset Open asset

Calibration results for honeybee (*A. mellifera*) splice site classification at four flanking sequence sizes.

(**A, C, E, G**) Reliability (calibration) curves for flanking sequence sizes of 80, 400, 2000, and 10,000 nt, respectively. Each plot compares predicted probabilities (x-axis) to empirical probabilities (y-axis) for non-splice sites (left), acceptor sites (middle), and donor sites (right). The blue curves depict the reliability of the original OSAI_Honeybee models, while the green curves show reliability after calibration. Shaded regions represent confidence intervals. The diagonal black line indicates perfect calibration, where predicted probabilities match observed frequencies exactly. (**B, D, F, H**) Temperature scaling maps for each corresponding flanking sequence size, illustrating how raw predicted probabilities for acceptor (x-axis) and donor (y-axis) sites are transformed after calibration. Arrows indicate the shift from pre- to post-calibration states in two-dimensional probability space.

Figure 5—figure supplement 7

Download asset Open asset

Calibration results for *A. thaliana* splice site classification at four flanking sequence sizes.

(**A, C, E, G**) Reliability (calibration) curves for flanking sequence sizes of 80, 400, 2000, and 10,000 nt, respectively. Each plot compares predicted probabilities (x-axis) to empirical probabilities (y-axis) for non-splice sites (left), acceptor sites (middle), and donor sites (right). The blue curves depict the reliability of the original OSAI_Arabidopsis models, while the green curves show reliability after calibration. Shaded regions represent confidence intervals. The diagonal black line indicates perfect calibration, where predicted probabilities match observed frequencies exactly. (**B, D, F, H**) Temperature scaling maps for each corresponding flanking sequence size, illustrating how raw predicted probabilities for acceptor (x-axis) and donor (y-axis) sites are transformed after calibration. Arrows indicate the shift from pre- to post-calibration states in two-dimensional probability space.

Figure 6 with 1 supplement

Download asset Open asset

Comparison of SpliceAI and OSAI_MANE in predicting mutation impacts and cryptic splicing events.

(A) Plot of importance scores for nucleotides near the acceptor site of exon 9 of U2SURP (top) and DST (bottom), for both SpliceAI and OSAI_MANE. The importance score is calculated by taking the average decrease in acceptor site score across the three possible point mutations at a given base position. (B) Plot of the impact of each possible point mutation within 80 bp of a donor (top) site or acceptor (bottom) site, for both SpliceAI and OSAI_MANE. The impact is the raw decrease in predicted splice site score after mutating a given base to a different one. (C) Visualization of cryptic splicing variants being predicted for the MYBPC3 gene (top), with an acceptor site gain and loss event, from SpliceAI’s original analysis, and the OPA1 gene (bottom), where a cryptic exon inclusion event was recently reported (Qian et al., 2021). (D) Predicted splice sites for the entire CFTR gene, with the corresponding predicted probability distribution by base position plotted below, for both SpliceAI and OSAI_MANE.

Figure 6—figure supplement 1

Download asset Open asset

Zoomed‐in 160 bp DNA sequence logos derived from in silico mutagenesis (ISM) importance score profiles for representative donor (**A–E**) and acceptor (**F–J**) splice sites.

Logos were generated by mapping the ISM importance score at each position centered on the splice site to letter heights, with SpliceAI scores shown in the upper logo and OSAI_MANE scores shown in the lower logo of each panel.

Tables

Table 1

Genome assembly and annotation details for species used for OpenSpliceAI training and transfer learning in this study.

Note: For each species, the table includes the GenBank accession number, assembly name, ftp sites for assembly and annotation downloads, and annotation release dates.

Species	Name	Genbank accession	Download link	Annotation release date
H. sapiens	GRCh38.p14	GCA_000001405.29	https://ftp.ncbi.nlm.nih.gov/genomes/all/annotation_releases/9606/GCF_000001405.40-RS_2023_03/	March 21, 2023
M. musculus	GRCm39	GCA_000001635.9	https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/635/GCF_000001635.27_GRCm39/	February 8, 2024
A. mellifera	Amel_HAv3.1	GCA_003254395.2	https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/003/254/395/GCF_003254395.2_Amel_HAv3.1/	September 30, 2022
A. thaliana	TAIR10.1	GCA_000001735.2	https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/735/GCF_000001735.4_TAIR10.1/	June 16, 2023
D. rerio	GRCz11	GCA_000002035.4	https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/002/035/GCF_000002035.6_GRCz11/	August 15, 2024

Table 2

Summary of the four OpenSpliceAI model architectures, each trained with a distinct flanking sequence length (80, 400, 2000, and 10,000 nucleotides).

The table lists the kernel sizes (W), dilation rates (AR), number of residual and skip blocks, and total cropping length (CL).

Parameter	Flanking = 80	Flanking = 400	Flanking = 2000	Flanking = 10,000
Kernel sizes (W)	[11, 11, 11, 11]	[11, 11, 11, 11, 11, 11, 11, 11]	[11, 11, 11, 11, 11, 11, 11, 11, 21, 21, 21, 21]	[11, 11, 11, 11, 11, 11, 11, 11, 21, 21, 21, 21, 41, 41, 41, 41]
Dilated rates (AR)	[1, 1, 1, 1]	[1, 1, 1, 1, 4, 4,,4,4]	[1, 1, 1, 1, 4, 4,,4,4, 10, 10, 10, 10]	[1, 1, 1, 1, 4, 4,,4,4, 10, 10, 10, 10, 25, 25, 25, 25]
Residual blocks	4	8	12	16
Skip connections	1 (inserted after residual block 4)	2 (inserted after residual blocks 4 and 8)	3 (inserted after residual blocks 4, 8, and 12)	4 (inserted after residual blocks 4, 8, 12, and 16)
Cropping length (CL); ( $2 \times \sum [A R \times (W - 1)]$ )	80	400	2000	10,000

Additional files

MDAR checklist: https://cdn.elifesciences.org/articles/107454/elife-107454-mdarchecklist1-v1.pdf
Download elife-107454-mdarchecklist1-v1.pdf

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Mendeley

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

Kuan-Hao Chao
Alan Mao
Anqi Liu
Steven L Salzberg
Mihaela Pertea

(2025)

OpenSpliceAI provides an efficient modular implementation of SpliceAI enabling easy retraining across nonhuman species

eLife 14:RP107454.

https://doi.org/10.7554/eLife.107454.3

Share this article

Cite this article

Overview of the OpenSpliceAI design.

Overview of the OpenSpliceAI architectures trained with different flanking sequence lengths.

Decision-making and workflow of the predict subcommand.

Decision-making and workflow of the variant subcommand.

Overview of OpenSpliceAI framework, performance benchmarking, and comparison with SpliceAI-Keras.

Comparison of splice site prediction performance between SpliceAI-Keras (blue) and OSAIMANE (orange) across human (Homo sapiens) datasets with varying flanking sequence lengths.

Comparison of splice site prediction performance between SpliceAI-Keras (blue) and OSAIMANE (orange) across house mouse (Mus musculus) datasets with varying flanking sequence lengths.

Comparison of splice site prediction performance between SpliceAI-Keras (blue) and OSAIMANE (orange) across honeybee (Apis mellifera) datasets with varying flanking sequence lengths.

Comparison of splice site prediction performance between SpliceAI-Keras (blue) and OSAIMANE (orange) across zebrafish (Danio rerio) datasets with varying flanking sequence lengths.

Comparison of splice site prediction performance between SpliceAI-Keras (blue) and OSAIMANE (orange) across Arabidopsis thaliana datasets with varying flanking sequence lengths.

Comparison of runtime and memory metrics for ‘predict’ (panels A–F) and ‘variant’ (panels G–L) in OSAIMANE models with different flanking sequences.

Cross-species dataset composition, sequence filtering, and performance comparison of OpenSpliceAI with SpliceAI-Keras.

Splice site motif count and intron length distributions across five species.

Splice site prediction metrics for the mouse (M. musculus) across varying flanking sequence lengths.

Splice site prediction metrics for the honeybee (A. mellifera) across varying flanking sequence lengths.

Splice site prediction metrics for the zebrafish (D. rerio) across varying flanking sequence lengths.

Splice site prediction metrics for the A. thaliana across varying flanking sequence lengths.

Performance comparison of scratch-trained and transfer-trained OSAIs across species and sequence lengths.

Transfer learning from OSAIMANE was leveraged to evaluate performance metrics for splice site prediction in mouse (M. musculus) across four flanking sequence lengths (80, 400, 2000, and 10,000 nt) and two splice site types (acceptor and donor).

Transfer learning from OSAIMANE was leveraged to evaluate performance metrics for splice site prediction in honeybee (A. mellifera) across four flanking sequence lengths (80, 400, 2000, and 10,000 nt) and two splice site types (acceptor and donor).

Transfer learning from OSAIMANE was leveraged to evaluate performance metrics for splice site prediction in zebrafish (D. rerio) across four flanking sequence lengths (80, 400, 2000, and 10,000 nt) and two splice site types (acceptor and donor).

Transfer learning from OSAIMANE was leveraged to evaluate performance metrics for splice site prediction in A. thaliana across four flanking sequence lengths (80, 400, 2000, and 10,000 nt) and two splice site types (acceptor and donor).

Cross‐species transfer learning performance on human splice‐site prediction.

Calibration of OSAIMANE predictions across splice-site classes and flanking sequence lengths.

Calibration results for human MANE splice site classification at four flanking sequence sizes.

Expected calibration error (ECE) on the validation (top) and test (bottom) sets.

Negative log likelihood (NLL) loss on the validation (top) and test (bottom) sets.

Calibration results for house mouse (M. musculus) splice site classification at four flanking sequence sizes.

Calibration results for zebrafish (D. rerio) splice site classification at four flanking sequence sizes.

Calibration results for honeybee (A. mellifera) splice site classification at four flanking sequence sizes.

Calibration results for A. thaliana splice site classification at four flanking sequence sizes.

Comparison of SpliceAI and OSAIMANE in predicting mutation impacts and cryptic splicing events.

Zoomed‐in 160 bp DNA sequence logos derived from in silico mutagenesis (ISM) importance score profiles for representative donor (A–E) and acceptor (F–J) splice sites.

Genome assembly and annotation details for species used for OpenSpliceAI training and transfer learning in this study.

Summary of the four OpenSpliceAI model architectures, each trained with a distinct flanking sequence length (80, 400, 2000, and 10,000 nucleotides).

MDAR checklist

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

Comparison of splice site prediction performance between SpliceAI-Keras (blue) and OSAI_MANE (orange) across human (Homo sapiens) datasets with varying flanking sequence lengths.

Comparison of splice site prediction performance between SpliceAI-Keras (blue) and OSAI_MANE (orange) across house mouse (Mus musculus) datasets with varying flanking sequence lengths.

Comparison of splice site prediction performance between SpliceAI-Keras (blue) and OSAI_MANE (orange) across honeybee (Apis mellifera) datasets with varying flanking sequence lengths.

Comparison of splice site prediction performance between SpliceAI-Keras (blue) and OSAI_MANE (orange) across zebrafish (Danio rerio) datasets with varying flanking sequence lengths.

Comparison of splice site prediction performance between SpliceAI-Keras (blue) and OSAI_MANE (orange) across Arabidopsis thaliana datasets with varying flanking sequence lengths.

Comparison of runtime and memory metrics for ‘predict’ (panels A–F) and ‘variant’ (panels G–L) in OSAI_MANE models with different flanking sequences.

Transfer learning from OSAI_MANE was leveraged to evaluate performance metrics for splice site prediction in mouse (M. musculus) across four flanking sequence lengths (80, 400, 2000, and 10,000 nt) and two splice site types (acceptor and donor).

Transfer learning from OSAI_MANE was leveraged to evaluate performance metrics for splice site prediction in honeybee (A. mellifera) across four flanking sequence lengths (80, 400, 2000, and 10,000 nt) and two splice site types (acceptor and donor).

Transfer learning from OSAI_MANE was leveraged to evaluate performance metrics for splice site prediction in zebrafish (D. rerio) across four flanking sequence lengths (80, 400, 2000, and 10,000 nt) and two splice site types (acceptor and donor).

Transfer learning from OSAI_MANE was leveraged to evaluate performance metrics for splice site prediction in A. thaliana across four flanking sequence lengths (80, 400, 2000, and 10,000 nt) and two splice site types (acceptor and donor).

Calibration of OSAI_MANE predictions across splice-site classes and flanking sequence lengths.

Comparison of SpliceAI and OSAI_MANE in predicting mutation impacts and cryptic splicing events.