Evaluating the Out-of-Domain Generalization in Source Code Vulnerability Detection

Nin Ho Le Viet; Tuan Nguyen Kim; Chieu Ta Quang

doi:10.48084/etasr.18302

Authors

Nin Ho Le Viet School of Computer Science, Duy Tan University, Danang, Vietnam | Faculty of Computer Science and Engineering, Thuyloi University, Tay Son, Hanoi, Vietnam
Tuan Nguyen Kim Phenikaa School of Computing, Phenikaa University, Duong Noi, Hanoi, Vietnam
Chieu Ta Quang Faculty of Computer Science and Engineering, Thuyloi University, Tay Son, Hanoi, Vietnam

Volume: 16 | Issue: 3 | Pages: 35548-35554 | June 2026 | https://doi.org/10.48084/etasr.18302

Received: 21 February 2026 | Revised: 22 March 2026 | Accepted: 1 April 2026 | Online: 6 June 2026

Corresponding author: Tuan Nguyen Kim

Abstract

Machine learning-based vulnerability detection is increasingly used in practice, yet most studies evaluate models only under in-domain settings. In real deployments, detectors often face code from different projects or data sources, leading to Out-of-Domain (OOD) performance degradation. This study proposes a systematic framework to evaluate OOD generalization in source code vulnerability detection. Experiments were conducted on widely used benchmarks, including Devign, Juliet Test Suite, Big-Vul, and National Vulnerability Database (NVD), with OOD test data stratified into Low, Medium, and High levels. The results indicate that accuracy and F1-score declined by 10-25% under OOD conditions, with the largest decrease reported at the High OOD level. Deep learning models, such as CodeBERT and GrapCodeBERT, are more efficient than traditional methods, including Support Vector Machine (SVM) and XGBoost; however, none of them remains stable across all domains. The proposed framework provides a practical basis for analyzing and improving OOD robustness.

Keywords:

machine learning, out-of-domain, cross-domain evaluation, CodeBERT, GraphCodeBERT

References

K. Zhou, Z. Liu, Y. Qiao, T. Xiang, and C. C. Loy, "Domain Generalization: A Survey," IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1–20, 2022.

Y. Zhou, S. Liu, J. Siow, and Y. Liu, "Devign: Effective Vulnerability Identification by Learning Comprehensive Program Semantics via Graph Neural Networks," in Proceedings of the 33rd International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, Dec. 2019, pp. 10197–10207.

T. Boland and P. E. Black, "Juliet 1.1 C/C++ and Java Test Suite," Computer, vol. 45, no. 10, pp. 88–90, Oct. 2012.

J. Fan, Y. Li, S. Wang, and T. N. Nguyen, "A C/C++ Code Vulnerability Dataset with Code Changes and CVE Summaries," in Proceedings of the 17th International Conference on Mining Software Repositories, Online (Virtual), Jun. 2020, pp. 508–512.

"National Vulnerability Database (NVD)," National Institute of Standards and Technology, U.S. Dept. of Commerce, Aug. 2024. https://nvd.nist.gov/.

Q. Du, S. Zhou, X. Kuang, G. Zhao, and J. Zhai, "Joint Geometrical and Statistical Domain Adaptation for Cross-domain Code Vulnerability Detection," in Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Singapore, 2023, pp. 12791–12800.

V. Nguyen, X. Yuan, T. Wu, S. Nepal, M. Grobler, and C. Rudolph, "Deep Learning-Based Out-of-Distribution Source Code Data Identification: How Far Have We Gone?" arXiv, 2024.

X. Li, Y. Xin, H. Zhu, Y. Yang, and Y. Chen, "Cross-Domain Vulnerability Detection Using Graph Embedding and Domain Adaptation," Computers & Security, vol. 125, Feb. 2023, Art. no. 103017.

H. Hanif and S. Maffeis, "VulBERTa: Simplified Source Code Pre-Training for Vulnerability Detection," in 2022 International Joint Conference on Neural Networks, Padua, Italy, Jul. 2022.

B. Zhang, T. H. M. Le, and M. A. Babar, "MVD: A Multi-Lingual Software Vulnerability Detection Framework." Jan. 2025.

S. Zaharia, T. Rebedea, and S. Trausan-Matu, "Detection of Software Security Weaknesses Using Cross-Language Source Code Representation (CLaSCoRe)," Applied Sciences, vol. 13, no. 13, Jul. 2023, Art. no. 7871.

G. Salton and C. Buckley, "Term-Weighting Approaches in Automatic Text Retrieval," Information Processing & Management, vol. 24, no. 5, pp. 513–523, Jan. 1988.

Z. Feng et al., "CodeBERT: A Pre-Trained Model for Programming and Natural Languages." arXiv, Sep. 18, 2020.

D. Guo et al., "GraphCodeBERT: Pre-training Code Representations with Data Flow," in Proceedings of the International Conference on Learning Representations, 2020, pp. 1–18.

B. B. Ammar and A. M. Alharbi, "SQL Injection Detection Using Fine-Tuned CodeBERT," Engineering, Technology & Applied Science Research, vol. 15, no. 5, pp. 27852–27857, Oct. 2025.

S. Riyanto, I. S. Sitanggang, T. Djatna, and T. D. Atikah, "Comparative Analysis Using Various Performance Metrics in Imbalanced Data for Multi-class Text Classification," International Journal of Advanced Computer Science and Applications, vol. 14, no. 6, 2023.

A. Ramesh Kashyap, D. Hazarika, M.-Y. Kan, and R. Zimmermann, "Domain Divergences: A Survey and Empirical Analysis," in Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Online (Virtual), 2021, pp. 1830–1849.

A. Atghaei and M. Rahmati, "Domain Generalization via Geometric Adaptation Over Augmented Data," Knowledge-Based Systems, vol. 309, Jan. 2025, Art. no. 112765.

A. Bulinski and D. Dimitrov, "Statistical Estimation of the Kullback–Leibler Divergence," Mathematics, vol. 9, no. 5, Mar. 2021, Art. no. 544.