Evaluation of in silico predictors on short nucleotide variants in HBA1, HBA2, and HBB associated with haemoglobinopathies

  1. Stella Tamana
  2. Maria Xenophontos
  3. Anna Minaidou
  4. Coralea Stephanou
  5. Cornelis L Harteveld
  6. Celeste Bento
  7. Joanne Traeger-Synodinos
  8. Irene Fylaktou
  9. Norafiza Mohd Yasin
  10. Faidatul Syazlin Abdul Hamid
  11. Ezalia Esa
  12. Hashim Halim-Fikri
  13. Bin Alwi Zilfalil
  14. Andrea C Kakouri
  15. ClinGen Hemoglobinopathy Variant Curation Expert Panel
  16. Marina Kleanthous
  17. Petros Kountouris  Is a corresponding author
  1. Molecular Genetics Thalassaemia Department, The Cyprus Institute of Neurology and Genetics, Cyprus
  2. Leiden University Medical Center, Netherlands
  3. Centro Hospitalar e Universitário de Coimbra, Portugal
  4. Laboratory of Medical Genetics, National and Kapodistrian University of Athens, Greece
  5. Division of Endocrinology, Metabolism and Diabetes, First Department of Pediatrics, National and Kapodistrian University of Athens, Greece
  6. Haematology Unit, Cancer Research Centre, Institute for Medical Research, National Health of Institutes (NIH), Ministry of Health Malaysia, Malaysia
  7. Malaysian Node of the Human Variome Project, School of Medical Sciences, Health Campus, Universiti Sains Malaysia, Malaysia
  8. Human Genome Centre, School of Medical Sciences, Health Campus, Universiti Sains Malaysia, Malaysia
5 figures, 2 tables and 4 additional files

Figures

Figure 1 with 1 supplement
A schematic flowchart of the methodology followed for this comparative analysis.
Figure 1—figure supplement 1
Comparison of initial and final pathogenicity classification of variants in the dataset.
Distribution of variants on each globin gene based on their actual pathogenicity.

A bin size of 3 bp (inframe) and 5 bp in exonic and intronic regions, respectively, is used for the illustration.

Descriptive plots of the short nucleotide variant (SNV) dataset.

(A) Variant effect on gene/protein function with respect to the annotated pathogenicity status. (B) Haemoglobinopathy group, (C) thalassaemia phenotype, (D) O2 affinity, (E) Hb stability, and (F) molecular mechanisms.

Heatmap illustrating the concordance and clustering of in silico tools with respect to the variant type and globin gene using the threshold that optimises the Matthews correlation coefficient (MCC), as shown in Table 1.
Figure 5 with 1 supplement
Comparison of the top performing in silico tools.

(A) Likelihood ratios of the top performing in silico tools with variable threshold. Vertical dashed lines indicate the optimal threshold based on the highest Matthews correlation coefficient (MCC). (B) Concordant pathogenic/likely pathogenic (P/LP) calls by any given combination of in silico tools (among top performing tools) for pathogenic variants. (C) Concordant benign/likely benign (B/LB) calls by any given combination of in silico tools (among top performing tools) for benign variants. For Panels B and C, the concordance rate (i.e., variant assertion for all tools in the combination matches the expert annotation) is provided as text annotation on the bar chart. Only the first top 10 tool combinations based on concordance rate are shown, with the top three shown in blue.

Figure 5—figure supplement 1
Concordance and VUS prediction of the top performing in silico tools.

(A) Heatmap illustrating the concordance and clustering of the top performing in silico tools with respect to the variant type and globin gene, using the separate non-overlapping thresholds for pathogenic and benign prediction, as shown in Table 2. (B) Prediction of variants of uncertain significance (VUS) using thresholds for the full dataset (at supporting strength).

Tables

Table 1
Results and performance comparison of in silico predictors with the optimal threshold based on MCC.

#PV: number of predicted variants; Ac: accuracy; Se: sensitivity; Sp: specificity; MCC: Matthews correlation coefficient; LR+: positive likelihood ratio; LR-: negative likelihood ratio; 95% CI: 95% confidence interval.

ToolDecision threshold#PVTPFNFPTNAcSeSpMCCLR+LR +95% CILR-LR- 95% CI
BayesDel_addAF≥0.3953125016422950.650.60.810.343.21[2.19, 4.72]0.49[0.42, 0.57]
CADD>10.4488665539101910.840.940.470.491.79[1.57, 2.05]0.12[0.08, 0.17]
ClinPred>0.954812659943740.70.730.630.321.98[1.55, 2.53]0.43[0.35, 0.53]
Condel>0.34813313376410.770.910.350.311.4[1.22, 1.61]0.26[0.17, 0.39]
DANN>0.965313724271460.790.90.390.331.48[1.28, 1.72]0.26[0.18, 0.37]
Eigen-PC>1.875313298535820.770.790.70.442.66[2, 3.52]0.29[0.23, 0.37]
FATHMM≤–3.3948115021423940.510.410.80.192.1[1.42, 3.08]0.73[0.65, 0.83]
fathmm-MKL>0.75313288639780.760.790.670.412.38[1.83, 3.09]0.31[0.25, 0.39]
GERP++>3.4953124816626910.640.60.780.312.7[1.9, 3.82]0.52[0.44, 0.6]
integrated_fitCons>0.05531414111710.7810.010.041.01[0.99, 1.02]0.28[0.02, 4.51]
LIST-S2≥0.753442462839310.810.90.440.361.61[1.3, 1.99]0.23[0.15, 0.36]
LRT<0.3270169784100.660.960.110.131.07[1, 1.16]0.37[0.15, 0.95]
MetaLR_score>0.848125111342750.680.690.640.291.92[1.49, 2.47]0.48[0.39, 0.59]
MetaSVM_score>0.648126010439780.70.710.670.342.14[1.65, 2.79]0.43[0.35, 0.53]
MutationAssessor>2.533592493641330.790.870.450.331.58[1.28, 1.94]0.28[0.19, 0.42]
MutationTaster>0.9553138628102150.760.930.130.091.07[0.99, 1.15]0.53[0.29, 0.95]
MutPred>0.54673431296160.770.970.140.21.13[1.04, 1.22]0.24[0.12, 0.49]
phastCons17way>0.175313575757600.790.860.510.381.77[1.46, 2.14]0.27[0.2, 0.36]
phastCons30way>0.285313298551660.740.790.560.331.82[1.48, 2.25]0.36[0.28, 0.47]
phyloP100way>0.425313496556610.770.840.520.351.76[1.45, 2.14]0.3[0.23, 0.4]
phyloP30way>0.5153130710763540.680.740.460.181.38[1.15, 1.64]0.56[0.43, 0.72]
PolyPhen-2>0.6548124312137800.670.670.680.312.11[1.6, 2.78]0.49[0.4, 0.59]
PROVEAN≤–1.034813586106110.770.980.090.181.09[1.02, 1.15]0.18[0.07, 0.46]
REVEL>0.654812947046710.760.810.610.392.05[1.63, 2.59]0.32[0.25, 0.41]
SIFT<0.14813253974430.770.890.370.31.41[1.22, 1.63]0.29[0.2, 0.43]
SiPhy_29way>10.6253123318133840.60.560.720.232[1.48, 2.7]0.61[0.52, 0.71]
VEST4>0.753127314133840.670.660.720.322.34[1.74, 3.15]0.47[0.4, 0.57]
Splicing prediction
ada>0.556473150.930.940.830.685.64[0.94, 33.8]0.07[0.02, 0.23]
MaxEntScanDiff >2 and Per >554502120.950.960.670.552.88[0.58, 14.31]0.06[0.01, 0.28]
rf>0.656473150.930.940.830.685.64[0.94, 33.8]0.07[0.02, 0.23]
SpliceAI>0.65663352316040.960.610.75365.09[50.94, 2616.41]0.4[0.29, 0.55]
Table 2
In silico tools with pairs of non-overlapping thresholds that reach at least supporting evidence strength for both pathogenic and benign prediction.

LR: likelihood ratio; CI: confidence interval; PV: predicted variants.

Pathogenic predictionBenign Prediction
ToolPathogenic thresholdSensitivityLR+LR+ 95% CIStrength (pathogenic)Benign thresholdSpecificityLR-LR- 95% CIStrength (benign)Correctly PV% of correctly PV
All SNVs
BayesDel_addAF≥0.390.63.21[2.19, 4.72]Supporting<0.230.440.35[0.26, 0.47]Supporting30256.87
CADD>250.398.27[4.34, 15.75]Moderate≤21.750.780.42[0.37, 0.48]Supporting41847.18
CADD>16.30.822.59[2.1, 3.2]Supporting≤16.30.680.26[0.21, 0.31]Supporting70379.35
Eigen-PC>1.90.793[2.21, 4.07]Supporting≤1.90.740.28[0.22, 0.35]Supporting41578.15
GERP++>4.220.444.33[2.51, 7.49]Supporting≤0.150.350.32[0.22, 0.46]Supporting22542.37
MetaSVM>0.810.553.25[2.16, 4.89]Supporting≤0.460.60.38[0.3, 0.48]Supporting27256.55
phyloP100way>7.320.1517.8[2.5, 127]Supporting≤0.80.570.36[0.28, 0.46]Supporting13024.48
REVEL>0.770.633.05[2.12, 4.4]Supporting≤0.70.690.38[0.31, 0.47]Supporting30964.24
SpliceAI>0.30.6758.12[27.23, 124.03]Strong≤0.30.990.33[0.23, 0.48]Supporting63796.08
Missense only
BayesDel_addAF≥0.410.543.35[2.2, 5.12]Supporting<0.220.440.32[0.23, 0.45]Supporting24151.72
CADD>23.250.63.19[2.17, 4.69]Supporting≤20.90.620.36[0.28, 0.46]Supporting28360.6
Eigen-PC>1.90.782.93[2.16, 3.98]Supporting≤1.90.740.3[0.24, 0.38]Supporting35776.61
GERP++>4.220.444.27[2.47, 7.4]Supporting≤–0.870.290.31[0.2, 0.47]Supporting18740.13
MetaSVM>0.80.583.08[2.09, 4.53]Supporting≤0.390.560.37[0.29, 0.48]Supporting26757.3
phastCons30way>0.940.523.19[2.09, 4.88]Supporting≤0.410.610.36[0.28, 0.46]Supporting25254.08
phyloP100way>7.320.1619.11[2.68, 136.47]Supporting≤0.560.530.35[0.27, 0.46]Supporting11925.54
REVEL>0.770.623.02[2.09, 4.35]Supporting≤0.70.690.39[0.32, 0.48]Supporting29763.73
Non-missense only
CADD>11.50.938.62[3.42, 21.77]Supporting≤11.50.890.08[0.05, 0.11]Supporting35092.84
SNVs in HBB
BayesDel_addAF≥0.310.86.43[2.23, 18.58]Supporting<0.310.880.22[0.17, 0.3]Supporting21081.08
CADD>25.250.4231.64[4.5, 222.38]Moderate≤22.650.920.42[0.37, 0.48]Supporting26448.71
CADD>10.80.943.26[2.29, 4.64]Supporting≤10.80.710.08[0.05, 0.12]Supporting49491.14
SNVs in HBA1
CADD>22.950.594.94[2.29, 10.68]Supporting≤170.660.3[0.19, 0.48]Supporting8461.76

Additional files

Supplementary file 1

The list of ClinGen Hemoglobinopathy variant curation expert panel (VCEP) members.

https://cdn.elifesciences.org/articles/79713/elife-79713-supp1-v2.xlsx
Supplementary file 2

Table with the dataset used in this study and the resulting scores obtained by the in silico predictors, divided into different sheets and subsets: all short nucleotide variants (SNVs), missense only, non-missense only, HBB, HBA1, and HBA2.

https://cdn.elifesciences.org/articles/79713/elife-79713-supp2-v2.xlsx
Supplementary file 3

Refined thresholds for the nine selected in silico predictors, divided into different subsets: all short nucleotide variants (SNVs), missense only, non-missense only, HBB, HBA1, and HBA2.

Only decision thresholds passing the likelihood ratio (LR) criteria for supporting evidence are shown.

https://cdn.elifesciences.org/articles/79713/elife-79713-supp3-v2.xlsx
MDAR checklist
https://cdn.elifesciences.org/articles/79713/elife-79713-mdarchecklist1-v2.docx

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Stella Tamana
  2. Maria Xenophontos
  3. Anna Minaidou
  4. Coralea Stephanou
  5. Cornelis L Harteveld
  6. Celeste Bento
  7. Joanne Traeger-Synodinos
  8. Irene Fylaktou
  9. Norafiza Mohd Yasin
  10. Faidatul Syazlin Abdul Hamid
  11. Ezalia Esa
  12. Hashim Halim-Fikri
  13. Bin Alwi Zilfalil
  14. Andrea C Kakouri
  15. ClinGen Hemoglobinopathy Variant Curation Expert Panel
  16. Marina Kleanthous
  17. Petros Kountouris
(2022)
Evaluation of in silico predictors on short nucleotide variants in HBA1, HBA2, and HBB associated with haemoglobinopathies
eLife 11:e79713.
https://doi.org/10.7554/eLife.79713