DEEP LEARNING WITH ENRICHED PATHOGENICITY SCORES FOR HBB VARIANT CLASSIFICATION IN β-THALASSEMIA

Authors

  • Raazia Sosan Waseem Author
  • Muhammad Hussain Habib Author

DOI:

https://doi.org/10.63075/9em1f154

Keywords:

HBB gene, β-thalassemia, pathogenicity, REVEL, AlphaMissense, CADD, SIFT, PolyPhen, ClinVar, gene therapy

Abstract

The monogenic disorder β-thalassemia is a common genetic condition caused by pathogenic variations in the HBB gene, and its accurate computational classification is difficult due to class imbalance, incomplete annotations, and individual predictor limitations. This work proposes a deep learning framework for HBB gene variant pathogenicity prediction, which is interpretable, using a combination of REVEL, AlphaMissense, and existing scoring, while using explainable AI for evaluating feature contributions. The dataset of 1585 HBB gene single nucleotide variations was collected from ClinVar, and functional scores were obtained using the myvariant.info API. Five models were implemented using stratified splitting, class imbalance was addressed using SMOTE and class weighting in loss functions. The class weighting deep learning model performed best, with a high ROC-AUC of 0.9483 and PR-AUC of 0.7912, marginally higher than RF and XGB models. Feature importance analysis revealed REVEL as the most dominant predictor, while AlphaMissense contributed significantly in second place. SHAP analysis revealed that REVEL contributes a global predictive value, while AlphaMissense contributes a context-dependent structural value, especially in terms of protein stability. Traditional scoring, such as SIFT, contributed minimally, indicating its redundancy in contributing to the model. This work concludes that the predictive capacity is dependent on the quality of the features and not the architecture, with the combination of sequence-based (REVEL) and structure-based predictors (AlphaMissense) providing complementary and biologically relevant signals. Furthermore, the high recall rates for pathogenic variants and the interpretability through the application of the SHAP analysis also point to the potential usefulness of the model in prioritizing variants of uncertain significance, enhancing diagnostic accuracy, and making informed decisions in the context of β-thalassemia screening, genetic counselling, and therapeutic interventions

Downloads

Published

2026-04-01

How to Cite

DEEP LEARNING WITH ENRICHED PATHOGENICITY SCORES FOR HBB VARIANT CLASSIFICATION IN β-THALASSEMIA . (2026). Review Journal of Neurological & Medical Sciences Review, 4(3), 389-402. https://doi.org/10.63075/9em1f154