Enhanced NLP for Medical Text Classification: A Deep Active Learning Approach
Received: 13 May 2025 | Revised: 17 June 2025, 8 July 2025, and 22 July 2025 | Accepted: 27 July 2025 | Online: 24 September 2025
Corresponding author: Palaparthi Seethalakshmi
Abstract
This paper presents an enhanced approach for classifying medical texts, combining Deep Active Incremental Learning (AIL) with state-of-the-art techniques to optimize healthcare authorization decisions. Using a Bi-LSTM architecture that is enhanced with contextual embedding and attention mechanisms, the model can dynamically learn from a few labeled data and update its predictions in real-time via entropy-based uncertainty sampling. The proposed framework adopted SMOTE and undersampling strategies. 117,000 actual medical authorization submissions were semantically processed through BioBERT embeddings and Named Entity Recognition (NER). The experimental results show that after 100 active phases of learning, the model achieved a gain of 4% balanced accuracy, indicating its ability to iteratively optimize predictions with minimal guidance. Through the optimization of performance in a constrained resource environment, this approach also enables faster and more efficient processing of medical claims, which can help build scalable and adaptive decision-making capacities.
Keywords:
NLP, text classification, active-learning, machine learning, deep learningDownloads
References
X. Chen and Y. Du, "Enhancing medical text classification with GAN-based data augmentation and multi-task learning in BERT," Scientific Reports, vol. 15, no. 1, Apr. 2025, Art. no. 13854.
R. L. Figueroa, Q. Zeng-Treitler, L. H. Ngo, S. Goryachev, and E. P. Wiechmann, "Active learning for clinical text classification: is it better than random sampling?," Journal of the American Medical Informatics Association, vol. 19, no. 5, pp. 809–816, Sep. 2012.
K. De Angeli et al., "Deep active learning for classifying cancer pathology reports," BMC Bioinformatics, vol. 22, no. 1, Dec. 2021, Art. no. 113.
D. H. M. Nguyen and J. D. Patrick, "Supervised machine learning and active learning in classification of radiology reports," Journal of the American Medical Informatics Association, vol. 21, no. 5, pp. 893–901, Sep. 2014.
M. Hughes, I. Li, S. Kotoulas, and S. Toyotaro, "Medical Text Classification Using Convolutional Neural Networks," in Studies in Health Technology and Informatics, IOS Press, 2017, pp. 246–250.
L. Qing, W. Linhong, and D. Xuehai, "A Novel Neural Network-Based Method for Medical Text Classification," Future Internet, vol. 11, no. 12, Dec. 2019, Art. no. 255.
A. Salau, N. A. Nwojo, M. M. Boukar, and O. Usen, "Advancing Preauthorization Task in Healthcare: An Application of Deep Active Incremental Learning for Medical Text Classification," Engineering, Technology & Applied Science Research, vol. 13, no. 6, pp. 12205–12210, Dec. 2023.
Y. Wang et al., "A clinical text classification paradigm using weak supervision and deep representation," BMC Medical Informatics and Decision Making, vol. 19, no. 1, Dec. 2019.
Z. Shen and S. Zhang, "A Novel Deep-Learning-Based Model for Medical Text Classification," in Proceedings of the 2020 9th International Conference on Computing and Pattern Recognition, Xiamen, China, Oct. 2020, pp. 267–273.
B. Settles, "Active Learning Literature Survey," University of Wisconsin-Madison Department of Computer Sciences, Technical Report, 2009.
N. Nissim et al., "An Active Learning Framework for Efficient Condition Severity Classification," in Artificial Intelligence in Medicine, vol. 9105, J. H. Holmes, R. Bellazzi, L. Sacchi, and N. Peek, Eds. Springer International Publishing, 2015, pp. 13–24.
A. E. W. Johnson et al., "MIMIC-III, a freely accessible critical care database," Scientific Data, vol. 3, no. 1, May 2016, Art. no. 160035.
A. Johnson, L. Bulgarelli, T. Pollard, S. Horng, L. A. Celi, and R. Mark, "MIMIC-IV." PhysioNet, https://doi.org/10.13026/S6N6-XD98.
E. Herrett et al., "Data Resource Profile: Clinical Practice Research Datalink (CPRD)," International Journal of Epidemiology, vol. 44, no. 3, pp. 827–836, Jun. 2015.
F. K. Khattak, S. Jeblee, C. Pou-Prom, M. Abdalla, C. Meaney, and F. Rudzicz, "A survey of word embeddings for clinical text," Journal of Biomedical Informatics, vol. 100, 2019, Art. no. 100057.
M. Badawy, N. Ramadan, and H. A. Hefny, "Big data analytics in healthcare: data sources, tools, challenges, and opportunities," Journal of Electrical Systems and Information Technology, vol. 11, no. 1, Dec. 2024, Art. no. 63.
F. Dernoncourt, J. Y. Lee, O. Uzuner, and P. Szolovits, "De-identification of patient notes with recurrent neural networks," Journal of the American Medical Informatics Association, vol. 24, no. 3, pp. 596–606, May 2017.
Downloads
How to Cite
License
Copyright (c) 2025 Palaparthi Seethalakshmi, Dhawaleshwara C. Rao, K. Swaroopa

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain the copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) after its publication in ETASR with an acknowledgement of its initial publication in this journal.