Improving Multilabel Classification Model Performance in Imbalanced Datasets with Group-Level Undersampling

Danny Sebastian; Hindriyanto Dwi Purnomo; Irwan Sembiring

doi:10.48084/etasr.10680

Authors

Danny Sebastian Fakultas Teknologi Informasi, Universitas Kristen Satya Wacana, Salatiga, Indonesia | Informatika, Universitas Kristen Duta Wacana, Yogyakarta, Indonesia https://orcid.org/0000-0002-7213-0571
Hindriyanto Dwi Purnomo Fakultas Teknologi Informasi, Universitas Kristen Satya Wacana, Salatiga, Indonesia https://orcid.org/0000-0001-6728-7868
Irwan Sembiring Fakultas Teknologi Informasi, Universitas Kristen Satya Wacana, Salatiga, Indonesia

Volume: 15 | Issue: 4 | Pages: 24764-24774 | August 2025 | https://doi.org/10.48084/etasr.10680

Received: 24 February 2025 | Revised: 18 April 2025 | Accepted: 8 May 2025 | Online: 2 August 2025

Corresponding author: Danny Sebastian

Abstract

Multilabel classification presents unique challenges, particularly when dealing with imbalanced datasets. This study introduces a Group-Level Undersampling (GLU) technique designed to enhance the performance and efficiency of multilabel classification models. By converting multilabels into combined labels and categorizing them into group levels, the proposed method preserves the diversity of label combinations, which is crucial for accurate classification. Undersampling is particularly challenging in multilabel classification because removing a single instance can affect the combination of classes. This study utilized Google Maps reviews of tourist sites in Yogyakarta, Indonesia, manually labeled into 11 classes. The IndoBERTweet model was fine-tuned using the GLU generated dataset, and its performance was compared with models trained on randomly selected datasets. Experimental results demonstrate that the GLU method can maintain the variation of multilabel datasets by using combined labels and group levels. The experimental results show that the GLU dataset significantly improves model accuracy and F1 scores while reducing computational time and resources. The findings suggest that the proposed undersampling technique offers a robust solution to address imbalanced datasets in multilabel classification, paving the way for more efficient and accurate machine learning models. However, there are still weaknesses in the ratio calculation formula, allowing some datasets to be suboptimal at certain group levels. Future research should apply the GLU technique to various datasets and multilabel tasks to assess its generalizability. Investigating hybrid data-balancing methods could further enhance model performance and efficiency. A sensitivity analysis on ratioBaseLine values and refining the ratio calculation formula will improve the robustness of the GLU method.

Keywords:

multilabel classification, undersampling, group level undersampling, dataset balancing

Downloads

Download data is not yet available.

References

K. Kowsari, K. Jafari Meimandi, M. Heidarysafa, S. Mendu, L. Barnes, and D. Brown, "Text Classification Algorithms: A Survey," Information, vol. 10, no. 4, Apr. 2019, Art. no. 150. DOI: https://doi.org/10.3390/info10040150

F. Herrera, F. Charte, A. J. Rivera, and M. J. Del Jesus, "Multilabel Classification," in Multilabel Classification, Springer International Publishing, 2016, pp. 17–31. DOI: https://doi.org/10.1007/978-3-319-41111-8_2

Q. Li et al., "A Survey on Text Classification: From Traditional to Deep Learning," ACM Transactions on Intelligent Systems and Technology, vol. 13, no. 2, pp. 1–41, Apr. 2022. DOI: https://doi.org/10.1145/3495162

J. Tanha, Y. Abdi, N. Samadi, N. Razzaghi, and M. Asadpour, "Boosting methods for multi-class imbalanced data classification: an experimental review," Journal of Big Data, vol. 7, no. 1, Sep. 2020, Art. no. 70. DOI: https://doi.org/10.1186/s40537-020-00349-y

A. Kumar, P. Makhija, and A. Gupta, "Noisy Text Data: Achilles’ Heel of BERT." arXiv, 2020. DOI: https://doi.org/10.18653/v1/2020.wnut-1.3

H. Hua, X. Li, D. Dou, C. Z. Xu, and J. Luo, "Noise Stability Regularization for Improving BERT Fine-tuning." arXiv, 2021. DOI: https://doi.org/10.18653/v1/2021.naacl-main.258

H. Zhang, L. Zhang, and Y. Jiang, "Overfitting and Underfitting Analysis for Deep Learning Based End-to-end Communication Systems," in 2019 11th International Conference on Wireless Communications and Signal Processing (WCSP), Xi’an, China, Oct. 2019, pp. 1–6. DOI: https://doi.org/10.1109/WCSP.2019.8927876

J. Bilmes, "Underfitting and overfitting in machine learning," UW ECE course notes, vol. 5, 2020.

S. Rahamat Basha and J. K. Rani, "A Comparative Approach of Dimensionality Reduction Techniques in Text Classification," Engineering, Technology & Applied Science Research, vol. 9, no. 6, pp. 4974–4979, Dec. 2019. DOI: https://doi.org/10.48084/etasr.3146

P. Gong, Y. Ma, C. Li, X. Ma, and S. H. Noh, "Understand Data Preprocessing for Effective End-to-End Training of Deep Neural Networks." arXiv, 2023.

S. Mohammed et al., "The Effects of Data Quality on Machine Learning Performance on Tabular Data," Information Systems, vol. 132, Jul. 2025, Art. no. 102549. DOI: https://doi.org/10.1016/j.is.2025.102549

C. Sun, X. Qiu, Y. Xu, and X. Huang, "How to Fine-Tune BERT for Text Classification?" arXiv, 2019. DOI: https://doi.org/10.1007/978-3-030-32381-3_16

T. Wongvorachan, S. He, and O. Bulut, "A Comparison of Undersampling, Oversampling, and SMOTE Methods for Dealing with Imbalanced Classification in Educational Data Mining," Information, vol. 14, no. 1, Jan. 2023, Art. no. 54. DOI: https://doi.org/10.3390/info14010054

C. Vairetti, J. L. Assadi, and S. Maldonado, "Efficient hybrid oversampling and intelligent undersampling for imbalanced big data classification," Expert Systems with Applications, vol. 246, Jul. 2024, Art. no. 123149. DOI: https://doi.org/10.1016/j.eswa.2024.123149

M. S. Shelke, P. R. Deshmukh, and V. K. Shandilya, "A Review on Imbalanced Data Handling Using Undersampling and Oversampling Technique," International Journal of Recent Trends in Engineering and Research, vol. 3, no. 4, pp. 444–449, May 2017. DOI: https://doi.org/10.23883/IJRTER.2017.3168.0UWXM

A. Tripathi, R. Chakraborty, and S. K. Kopparapu, "A Novel Adaptive Minority Oversampling Technique for Improved Classification in Data Imbalanced Scenarios," in 2020 25th International Conference on Pattern Recognition (ICPR), Milan, Italy, Jan. 2021, pp. 10650–10657. DOI: https://doi.org/10.1109/ICPR48806.2021.9413002

I. Letteri, A. Di Cecco, A. Dyoub, and G. Della Penna, "A Novel Resampling Technique for Imbalanced Dataset Optimization." arXiv, 2020.

S. S. Rawat and A. K. Mishra, "Review of Methods for Handling Class-Imbalanced in Classification Problems." arXiv, 2022.

J. Bogatinovski, L. Todorovski, S. Džeroski, and D. Kocev, "Comprehensive comparative study of multi-label classification methods," Expert Systems with Applications, vol. 203, Oct. 2022, Art. no. 117215. DOI: https://doi.org/10.1016/j.eswa.2022.117215

A. Law and A. Ghosh, "Multi-Label Classification Using Binary Tree of Classifiers," IEEE Transactions on Emerging Topics in Computational Intelligence, vol. 6, no. 3, pp. 677–689, Jun. 2022. DOI: https://doi.org/10.1109/TETCI.2021.3075717

A. Y. Taha, S. Tiun, A. H. A. Rahman, and A. Sabah, "Multilabel over-sampling and under-sampling with class alignment for imbalanced multilabel text classification," Journal of Information and Communication Technology, vol. 20, no. 3, pp. 423–456, Jun. 2021. DOI: https://doi.org/10.32890/jict2021.20.3.6

A. N. Tarekegn, M. Giacobini, and K. Michalak, "A review of methods for imbalanced multi-label classification," Pattern Recognition, vol. 118, Oct. 2021, Art. no. 107965. DOI: https://doi.org/10.1016/j.patcog.2021.107965

C. Li et al., "DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and Routing," in Proceedings of the AAAI Conference on Artificial Intelligence, Mar. 2024, vol. 38, pp. 18490–18498. DOI: https://doi.org/10.1609/aaai.v38i16.29810

N. S. Chatterji, S. Haque, and T. Hashimoto, "Undersampling is a Minimax Optimal Robustness Intervention in Nonparametric Classification." arXiv, Jun. 19, 2023.

N. M. Mqadi, N. Naicker, and T. Adeliyi, "Solving Misclassification of the Credit Card Imbalance Problem Using Near Miss," Mathematical Problems in Engineering, vol. 2021, no. 1, 2021, Art. no. 7194728. DOI: https://doi.org/10.1155/2021/7194728

A. Tanimoto, S. Yamada, T. Takenouchi, M. Sugiyama, and H. Kashima, "Improving imbalanced classification using near-miss instances," Expert Systems with Applications, vol. 201, Sep. 2022, Art. no. 117130. DOI: https://doi.org/10.1016/j.eswa.2022.117130

J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding," in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, MN, USA, Mar. 2019, pp. 4171–4186.

D. Sebastian, H. D. Purnomo, and I. Sembiring, "BERT for Natural Language Processing in Bahasa Indonesia," in 2022 2nd International Conference on Intelligent Cybernetics Technology & Applications (ICICyTA), Bandung, Indonesia, Dec. 2022, pp. 204–209. DOI: https://doi.org/10.1109/ICICyTA57421.2022.10038230

C. Jhakal, K. Singal, M. Suri, D. Chaudhary, B. Kumar, and I. Gorton, "Detection of Sexism on Social Media with Multiple Simple Transformers," in CLEF 2023: Conference and Labs of the Evaluation Forum, Thessaloniki, Greece, Sep. 2023.

A. Vaswani et al., "Attention Is All You Need." arXiv, 2017.

M. V. Koroteev, "BERT: A Review of Applications in Natural Language Processing and Understanding." arXiv, Mar. 22, 2021.

Y. Hao, L. Dong, F. Wei, and K. Xu, "Visualizing and Understanding the Effectiveness of BERT." arXiv, Aug. 15, 2019. DOI: https://doi.org/10.18653/v1/D19-1424

Y. Zhou and V. Srikumar, "A Closer Look at How Fine-tuning Changes BERT." arXiv, Mar. 16, 2022. DOI: https://doi.org/10.18653/v1/2022.acl-long.75

M. Shen, "Rethinking Data Selection for Supervised Fine-Tuning." arXiv, Feb. 08, 2024.

F. Kang et al., "Get more for less: Principled Data Selection for Warming Up Fine-Tuning in LLMs." arXiv, May 05, 2024.

B. Wilie et al., "IndoNLU: Benchmark and Resources for Evaluating Indonesian Natural Language Understanding." arXiv, Oct. 08, 2020. DOI: https://doi.org/10.18653/v1/2020.aacl-main.85

F. Koto, A. Rahimi, J. H. Lau, and T. Baldwin, "IndoLEM and IndoBERT: A Benchmark Dataset and Pre-trained Language Model for Indonesian NLP." arXiv, Nov. 02, 2020. DOI: https://doi.org/10.18653/v1/2020.coling-main.66

F. Koto, J. H. Lau, and T. Baldwin, "IndoBERTweet: A Pretrained Language Model for Indonesian Twitter with Effective Domain-Specific Vocabulary Initialization." arXiv, Sep. 10, 2021. DOI: https://doi.org/10.18653/v1/2021.emnlp-main.833

N. Sureja, N. Chaudhari, P. Patel, J. Bhatt, T. Desai, and V. Parikh, "Hyper-tuned Swarm Intelligence Machine Learning-based Sentiment Analysis of Social Media," Engineering, Technology & Applied Science Research, vol. 14, no. 4, pp. 15415–15421, Aug. 2024. DOI: https://doi.org/10.48084/etasr.7818