SQL Injection Detection Using Fine-Tuned CodeBERT

Boulbaba Ben Ammar; Ameni M. Alharbi

doi:10.48084/etasr.13340

Authors

Boulbaba Ben Ammar Department of Computer Science, College of Computer, Qassim University, Buraydah, Saudi Arabia
Ameni M. Alharbi Department of Computer Science, College of Computer, Qassim University, Buraydah, Saudi Arabia

Volume: 15 | Issue: 5 | Pages: 27852-27857 | October 2025 | https://doi.org/10.48084/etasr.13340

Received: 12 July 2025 | Revised: 4 August 2025, 20 August 2025, and 22 August 2025 | Accepted: 26 August 2025 | Online: 23 September 2025

Corresponding author: Boulbaba Ben Ammar

Abstract

SQL injection attacks continue to pose a serious threat to web application security, with millions of systems vulnerable worldwide despite ongoing research and mitigation efforts. This study addresses the limitations of current SQLi detection techniques, particularly their inability to adapt to evolving attack patterns and their tendency to high false positive rates. A systematic review of machine learning- and deep learning-based SQL injection detection studies, published between 2017 and 2023, examined performance metrics, methods, and common shortcomings. Building on these insights, this study proposes a novel transformer-based approach utilizing a fine-tuned CodeBERT model trained on 30,919 SQL queries. This method includes extensive preprocessing and rigorous evaluation to ensure robustness and applicability. The results show that while traditional machine learning approaches reach between 73.5% and 99% accuracy, and deep learning models achieve 85% to 98%, the proposed CodeBERT-based system significantly outperforms them, attaining 99.90% accuracy, 99.96% precision, 97.75% recall, and 99.86% F1-score. These findings underscore the effectiveness of transformer models trained on code for SQL injection detection, setting new benchmarks and offering a deployable solution to enhance web application security.

Keywords:

SQL injection, web security, transformer models, CodeBERT, machine learning, deep learning

Downloads

Download data is not yet available.

References

"OWASP Top Ten | OWASP Foundation." https://owasp.org/www-project-top-ten/.

K. Zhang, "A Machine Learning Based Approach to Identify SQL Injection Vulnerabilities," in 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE), San Diego, CA, USA, Nov. 2019, pp. 1286–1288.

P. B. Ogini et al., "A Deep Learning Approach for The Detection of Structured Query Language Injection Vulnerability," International Journal of Advanced Trends in Computer Science and Engineering, vol. 11, no. 5, pp. 211–217, Oct. 2022.

N. Gandhi, J. Patel, R. Sisodiya, N. Doshi, and S. Mishra, "A CNN-BiLSTM based Approach for Detection of SQL Injection Attacks," in 2021 International Conference on Computational Intelligence and Knowledge Economy (ICCIKE), Dubai, United Arab Emirates, Mar. 2021, pp. 378–383.

Md. M. Hassan, R. B. Ahmad, and T. Ghosh, "SQL Injection Vulnerability Detection Using Deep Learning: A Feature-based Approach," Indonesian Journal of Electrical Engineering and Informatics (IJEEI), vol. 9, no. 3, pp. 702–718, Aug. 2021.

T. Muhammad and H. Ghafory, "SQL Injection Attack Detection Using Machine Learning Algorithm," Mesopotamian Journal of CyberSecurity, vol. 2022, pp. 5–17, Feb. 2022.

S. O. Uwagbole, W. J. Buchanan, and L. Fan, "Applied Machine Learning predictive analytics to SQL Injection Attack detection and prevention," in 2017 IFIP/IEEE Symposium on Integrated Network and Service Management (IM), Lisbon, Portugal, May 2017, pp. 1087–1090.

M. Hasan, Z. Balbahaith, and M. Tarique, "Detection of SQL Injection Attacks: A Machine Learning Approach," in 2019 International Conference on Electrical and Computing Technologies and Applications (ICECTA), Ras Al Khaimah, United Arab Emirates, Nov. 2019, pp. 1–6.

M. M. Ibrohim and V. Suryani, "Classification of SQL Injection Attacks using ensemble learning SVM and Naïve Bayes," in 2023 International Conference on Data Science and Its Applications (ICoDSA), Bandung, Indonesia, Aug. 2023, pp. 230–236.

A. ALAzzawi, "SQL injection detection using RNN deep learning model," Journal of Applied Engineering and Technological Science (JAETS), vol. 5, no. 1, pp. 531–541, 2023.

A. A. Ashlam, A. Badii, and F. Stahl, "A Novel Approach Exploiting Machine Learning to Detect SQLi Attacks," in 2022 5th International Conference on Advanced Systems and Emergent Technologies (IC_ASET), Hammamet, Tunisia, Mar. 2022, pp. 513–517.

A. T. Azar, S. U. Amin, M. A. Majeed, A. Al-Khayyat, and I. Kasim, "Cloud-Cyber Physical Systems: Enhanced Metaheuristics with Hierarchical Deep Learning-based Cyberattack Detection," Engineering, Technology & Applied Science Research, vol. 14, no. 6, pp. 17572–17583, Dec. 2024.

Z. Feng et al., "CodeBERT: A Pre-Trained Model for Programming and Natural Languages." arXiv, Sep. 18, 2020.

"SQL Injection Dataset." Kaggle, [Online]. Available: https://www.kaggle.com/datasets/sajid576/sql-injection-dataset.

Z. S. Rubaidi, B. B. Ammar, and M. B. Aouicha, "Comparative Data Oversampling Techniques with Deep Learning Algorithms for Credit Card Fraud Detection," in Intelligent Systems Design and Applications, 2023, pp. 286–296.

Z. S. Rubaidi, B. B. Ammar, and M. B. Aouicha, "Handling Imbalance Functional and Non-Functional Software Requirement Classification Based on Machine Learning Algorithms," in Hybrid Intelligent Systems, 2025, pp. 199–209.

"boulbaba1981/SQLi-Detector." Jul. 27, 2025, [Online]. Available: https://github.com/boulbaba1981/SQLi-Detector.

SQL Injection Detection Using Fine-Tuned CodeBERT

Authors

Abstract

Keywords:

Downloads

References

Downloads

How to Cite

Metrics

License