Enhanced NLP for Medical Text Classification: A Deep Active Learning Approach

Authors

  • Palaparthi Seethalakshmi Centurion University of Technology and Management, Odisha, India
  • Dhawaleshwara Rao CH Department of Computer Science and Engineering, Centurion University of Technology and Management, Odisha, India
  • K. Swaroopa Department of Computer Science and Engineering (Data Science), Aditya Institute of Technology and Management, Tekkali, Srikakulam Andhra Pradesh, India
Volume: 15 | Issue: 5 | Pages: 27710-27714 | October 2025 | https://doi.org/10.48084/etasr.12114

Abstract

This paper presents an enhanced approach for classifying medical texts, combining Deep Active Incremental Learning (AIL) with state-of-the-art techniques to optimize healthcare authorization decisions. Using a Bi-LSTM architecture that is enhanced with contextual embedding and attention mechanisms, the model can dynamically learn from a few labeled data and update its predictions in real-time via entropy-based uncertainty sampling. The proposed framework adopted SMOTE and undersampling strategies. 117,000 actual medical authorization submissions were semantically processed through BioBERT embeddings and Named Entity Recognition (NER). The experimental results show that after 100 active phases of learning, the model achieved a gain of 4% balanced accuracy, indicating its ability to iteratively optimize predictions with minimal guidance. Through the optimization of performance in a constrained resource environment, this approach also enables faster and more efficient processing of medical claims, which can help build scalable and adaptive decision-making capacities.

Keywords:

NLP, text classification, active-learning, machine learning, deep learning

Downloads

Download data is not yet available.

References

X. Chen and Y. Du, "Enhancing medical text classification with GAN-based data augmentation and multi-task learning in BERT," Scientific Reports, vol. 15, no. 1, Apr. 2025, Art. no. 13854.

R. L. Figueroa, Q. Zeng-Treitler, L. H. Ngo, S. Goryachev, and E. P. Wiechmann, "Active learning for clinical text classification: is it better than random sampling?," Journal of the American Medical Informatics Association, vol. 19, no. 5, pp. 809–816, Sep. 2012.

K. De Angeli et al., "Deep active learning for classifying cancer pathology reports," BMC Bioinformatics, vol. 22, no. 1, Dec. 2021, Art. no. 113.

D. H. M. Nguyen and J. D. Patrick, "Supervised machine learning and active learning in classification of radiology reports," Journal of the American Medical Informatics Association, vol. 21, no. 5, pp. 893–901, Sep. 2014.

M. Hughes, I. Li, S. Kotoulas, and S. Toyotaro, "Medical Text Classification Using Convolutional Neural Networks," in Studies in Health Technology and Informatics, IOS Press, 2017, pp. 246–250.

L. Qing, W. Linhong, and D. Xuehai, "A Novel Neural Network-Based Method for Medical Text Classification," Future Internet, vol. 11, no. 12, Dec. 2019, Art. no. 255.

A. Salau, N. A. Nwojo, M. M. Boukar, and O. Usen, "Advancing Preauthorization Task in Healthcare: An Application of Deep Active Incremental Learning for Medical Text Classification," Engineering, Technology & Applied Science Research, vol. 13, no. 6, pp. 12205–12210, Dec. 2023.

Y. Wang et al., "A clinical text classification paradigm using weak supervision and deep representation," BMC Medical Informatics and Decision Making, vol. 19, no. 1, Dec. 2019.

Z. Shen and S. Zhang, "A Novel Deep-Learning-Based Model for Medical Text Classification," in Proceedings of the 2020 9th International Conference on Computing and Pattern Recognition, Xiamen, China, Oct. 2020, pp. 267–273.

B. Settles, "Active Learning Literature Survey," University of Wisconsin-Madison Department of Computer Sciences, Technical Report, 2009.

N. Nissim et al., "An Active Learning Framework for Efficient Condition Severity Classification," in Artificial Intelligence in Medicine, vol. 9105, J. H. Holmes, R. Bellazzi, L. Sacchi, and N. Peek, Eds. Springer International Publishing, 2015, pp. 13–24.

A. E. W. Johnson et al., "MIMIC-III, a freely accessible critical care database," Scientific Data, vol. 3, no. 1, May 2016, Art. no. 160035.

A. Johnson, L. Bulgarelli, T. Pollard, S. Horng, L. A. Celi, and R. Mark, "MIMIC-IV." PhysioNet, https://doi.org/10.13026/S6N6-XD98.

E. Herrett et al., "Data Resource Profile: Clinical Practice Research Datalink (CPRD)," International Journal of Epidemiology, vol. 44, no. 3, pp. 827–836, Jun. 2015.

F. K. Khattak, S. Jeblee, C. Pou-Prom, M. Abdalla, C. Meaney, and F. Rudzicz, "A survey of word embeddings for clinical text," Journal of Biomedical Informatics, vol. 100, 2019, Art. no. 100057.

M. Badawy, N. Ramadan, and H. A. Hefny, "Big data analytics in healthcare: data sources, tools, challenges, and opportunities," Journal of Electrical Systems and Information Technology, vol. 11, no. 1, Dec. 2024, Art. no. 63.

F. Dernoncourt, J. Y. Lee, O. Uzuner, and P. Szolovits, "De-identification of patient notes with recurrent neural networks," Journal of the American Medical Informatics Association, vol. 24, no. 3, pp. 596–606, May 2017.

Downloads

How to Cite

[1]
P. Seethalakshmi, D. R. CH, and K. Swaroopa, “Enhanced NLP for Medical Text Classification: A Deep Active Learning Approach”, Eng. Technol. Appl. Sci. Res., vol. 15, no. 5, pp. 27710–27714, Oct. 2025.

Metrics

Abstract Views: 20
PDF Downloads: 9

Metrics Information