DEEP LEARNING WITH ENRICHED PATHOGENICITY SCORES FOR HBB VARIANT CLASSIFICATION IN β-THALASSEMIA
DOI:
https://doi.org/10.63075/9em1f154Keywords:
HBB gene, β-thalassemia, pathogenicity, REVEL, AlphaMissense, CADD, SIFT, PolyPhen, ClinVar, gene therapyAbstract
The monogenic disorder β-thalassemia is a common genetic condition caused by pathogenic variations in the HBB gene, and its accurate computational classification is difficult due to class imbalance, incomplete annotations, and individual predictor limitations. This work proposes a deep learning framework for HBB gene variant pathogenicity prediction, which is interpretable, using a combination of REVEL, AlphaMissense, and existing scoring, while using explainable AI for evaluating feature contributions. The dataset of 1585 HBB gene single nucleotide variations was collected from ClinVar, and functional scores were obtained using the myvariant.info API. Five models were implemented using stratified splitting, class imbalance was addressed using SMOTE and class weighting in loss functions. The class weighting deep learning model performed best, with a high ROC-AUC of 0.9483 and PR-AUC of 0.7912, marginally higher than RF and XGB models. Feature importance analysis revealed REVEL as the most dominant predictor, while AlphaMissense contributed significantly in second place. SHAP analysis revealed that REVEL contributes a global predictive value, while AlphaMissense contributes a context-dependent structural value, especially in terms of protein stability. Traditional scoring, such as SIFT, contributed minimally, indicating its redundancy in contributing to the model. This work concludes that the predictive capacity is dependent on the quality of the features and not the architecture, with the combination of sequence-based (REVEL) and structure-based predictors (AlphaMissense) providing complementary and biologically relevant signals. Furthermore, the high recall rates for pathogenic variants and the interpretability through the application of the SHAP analysis also point to the potential usefulness of the model in prioritizing variants of uncertain significance, enhancing diagnostic accuracy, and making informed decisions in the context of β-thalassemia screening, genetic counselling, and therapeutic interventionsDownloads
Published
2026-04-01
Issue
Section
Articles
How to Cite
DEEP LEARNING WITH ENRICHED PATHOGENICITY SCORES FOR HBB VARIANT CLASSIFICATION IN β-THALASSEMIA . (2026). Review Journal of Neurological & Medical Sciences Review, 4(3), 389-402. https://doi.org/10.63075/9em1f154