Evaluating the Out-of-Domain Generalization in Source Code Vulnerability Detection
Received: 21 February 2026 | Revised: 22 March 2026 | Accepted: 1 April 2026 | Online: 6 June 2026
Corresponding author: Tuan Nguyen Kim
Abstract
Machine learning-based vulnerability detection is increasingly used in practice, yet most studies evaluate models only under in-domain settings. In real deployments, detectors often face code from different projects or data sources, leading to Out-of-Domain (OOD) performance degradation. This study proposes a systematic framework to evaluate OOD generalization in source code vulnerability detection. Experiments were conducted on widely used benchmarks, including Devign, Juliet Test Suite, Big-Vul, and National Vulnerability Database (NVD), with OOD test data stratified into Low, Medium, and High levels. The results indicate that accuracy and F1-score declined by 10-25% under OOD conditions, with the largest decrease reported at the High OOD level. Deep learning models, such as CodeBERT and GrapCodeBERT, are more efficient than traditional methods, including Support Vector Machine (SVM) and XGBoost; however, none of them remains stable across all domains. The proposed framework provides a practical basis for analyzing and improving OOD robustness.
Keywords:
machine learning, out-of-domain, cross-domain evaluation, CodeBERT, GraphCodeBERTReferences
K. Zhou, Z. Liu, Y. Qiao, T. Xiang, and C. C. Loy, "Domain Generalization: A Survey," IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1–20, 2022.
Y. Zhou, S. Liu, J. Siow, and Y. Liu, "Devign: Effective Vulnerability Identification by Learning Comprehensive Program Semantics via Graph Neural Networks," in Proceedings of the 33rd International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, Dec. 2019, pp. 10197–10207.
T. Boland and P. E. Black, "Juliet 1.1 C/C++ and Java Test Suite," Computer, vol. 45, no. 10, pp. 88–90, Oct. 2012.
J. Fan, Y. Li, S. Wang, and T. N. Nguyen, "A C/C++ Code Vulnerability Dataset with Code Changes and CVE Summaries," in Proceedings of the 17th International Conference on Mining Software Repositories, Online (Virtual), Jun. 2020, pp. 508–512.
"National Vulnerability Database (NVD)," National Institute of Standards and Technology, U.S. Dept. of Commerce, Aug. 2024. https://nvd.nist.gov/.
Q. Du, S. Zhou, X. Kuang, G. Zhao, and J. Zhai, "Joint Geometrical and Statistical Domain Adaptation for Cross-domain Code Vulnerability Detection," in Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Singapore, 2023, pp. 12791–12800.
V. Nguyen, X. Yuan, T. Wu, S. Nepal, M. Grobler, and C. Rudolph, "Deep Learning-Based Out-of-Distribution Source Code Data Identification: How Far Have We Gone?" arXiv, 2024.
X. Li, Y. Xin, H. Zhu, Y. Yang, and Y. Chen, "Cross-Domain Vulnerability Detection Using Graph Embedding and Domain Adaptation," Computers & Security, vol. 125, Feb. 2023, Art. no. 103017.
H. Hanif and S. Maffeis, "VulBERTa: Simplified Source Code Pre-Training for Vulnerability Detection," in 2022 International Joint Conference on Neural Networks, Padua, Italy, Jul. 2022.
B. Zhang, T. H. M. Le, and M. A. Babar, "MVD: A Multi-Lingual Software Vulnerability Detection Framework." Jan. 2025.
S. Zaharia, T. Rebedea, and S. Trausan-Matu, "Detection of Software Security Weaknesses Using Cross-Language Source Code Representation (CLaSCoRe)," Applied Sciences, vol. 13, no. 13, Jul. 2023, Art. no. 7871.
G. Salton and C. Buckley, "Term-Weighting Approaches in Automatic Text Retrieval," Information Processing & Management, vol. 24, no. 5, pp. 513–523, Jan. 1988.
Z. Feng et al., "CodeBERT: A Pre-Trained Model for Programming and Natural Languages." arXiv, Sep. 18, 2020.
D. Guo et al., "GraphCodeBERT: Pre-training Code Representations with Data Flow," in Proceedings of the International Conference on Learning Representations, 2020, pp. 1–18.
B. B. Ammar and A. M. Alharbi, "SQL Injection Detection Using Fine-Tuned CodeBERT," Engineering, Technology & Applied Science Research, vol. 15, no. 5, pp. 27852–27857, Oct. 2025.
S. Riyanto, I. S. Sitanggang, T. Djatna, and T. D. Atikah, "Comparative Analysis Using Various Performance Metrics in Imbalanced Data for Multi-class Text Classification," International Journal of Advanced Computer Science and Applications, vol. 14, no. 6, 2023.
A. Ramesh Kashyap, D. Hazarika, M.-Y. Kan, and R. Zimmermann, "Domain Divergences: A Survey and Empirical Analysis," in Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Online (Virtual), 2021, pp. 1830–1849.
A. Atghaei and M. Rahmati, "Domain Generalization via Geometric Adaptation Over Augmented Data," Knowledge-Based Systems, vol. 309, Jan. 2025, Art. no. 112765.
A. Bulinski and D. Dimitrov, "Statistical Estimation of the Kullback–Leibler Divergence," Mathematics, vol. 9, no. 5, Mar. 2021, Art. no. 544.
Downloads
How to Cite
License
Copyright (c) 2026 Nin Ho Le Viet, Tuan Nguyen Kim, Chieu Ta Quang

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain the copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) after its publication in ETASR with an acknowledgement of its initial publication in this journal.
