SQL Injection Detection Using Fine-Tuned CodeBERT
Received: 12 July 2025 | Revised: 4 August 2025, 20 August 2025, and 22 August 2025 | Accepted: 26 August 2025 | Online: 23 September 2025
Corresponding author: Boulbaba Ben Ammar
Abstract
SQL injection attacks continue to pose a serious threat to web application security, with millions of systems vulnerable worldwide despite ongoing research and mitigation efforts. This study addresses the limitations of current SQLi detection techniques, particularly their inability to adapt to evolving attack patterns and their tendency to high false positive rates. A systematic review of machine learning- and deep learning-based SQL injection detection studies, published between 2017 and 2023, examined performance metrics, methods, and common shortcomings. Building on these insights, this study proposes a novel transformer-based approach utilizing a fine-tuned CodeBERT model trained on 30,919 SQL queries. This method includes extensive preprocessing and rigorous evaluation to ensure robustness and applicability. The results show that while traditional machine learning approaches reach between 73.5% and 99% accuracy, and deep learning models achieve 85% to 98%, the proposed CodeBERT-based system significantly outperforms them, attaining 99.90% accuracy, 99.96% precision, 97.75% recall, and 99.86% F1-score. These findings underscore the effectiveness of transformer models trained on code for SQL injection detection, setting new benchmarks and offering a deployable solution to enhance web application security.
Keywords:
SQL injection, web security, transformer models, CodeBERT, machine learning, deep learningDownloads
References
"OWASP Top Ten | OWASP Foundation." https://owasp.org/www-project-top-ten/.
K. Zhang, "A Machine Learning Based Approach to Identify SQL Injection Vulnerabilities," in 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE), San Diego, CA, USA, Nov. 2019, pp. 1286–1288.
P. B. Ogini et al., "A Deep Learning Approach for The Detection of Structured Query Language Injection Vulnerability," International Journal of Advanced Trends in Computer Science and Engineering, vol. 11, no. 5, pp. 211–217, Oct. 2022.
N. Gandhi, J. Patel, R. Sisodiya, N. Doshi, and S. Mishra, "A CNN-BiLSTM based Approach for Detection of SQL Injection Attacks," in 2021 International Conference on Computational Intelligence and Knowledge Economy (ICCIKE), Dubai, United Arab Emirates, Mar. 2021, pp. 378–383.
Md. M. Hassan, R. B. Ahmad, and T. Ghosh, "SQL Injection Vulnerability Detection Using Deep Learning: A Feature-based Approach," Indonesian Journal of Electrical Engineering and Informatics (IJEEI), vol. 9, no. 3, pp. 702–718, Aug. 2021.
T. Muhammad and H. Ghafory, "SQL Injection Attack Detection Using Machine Learning Algorithm," Mesopotamian Journal of CyberSecurity, vol. 2022, pp. 5–17, Feb. 2022.
S. O. Uwagbole, W. J. Buchanan, and L. Fan, "Applied Machine Learning predictive analytics to SQL Injection Attack detection and prevention," in 2017 IFIP/IEEE Symposium on Integrated Network and Service Management (IM), Lisbon, Portugal, May 2017, pp. 1087–1090.
M. Hasan, Z. Balbahaith, and M. Tarique, "Detection of SQL Injection Attacks: A Machine Learning Approach," in 2019 International Conference on Electrical and Computing Technologies and Applications (ICECTA), Ras Al Khaimah, United Arab Emirates, Nov. 2019, pp. 1–6.
M. M. Ibrohim and V. Suryani, "Classification of SQL Injection Attacks using ensemble learning SVM and Naïve Bayes," in 2023 International Conference on Data Science and Its Applications (ICoDSA), Bandung, Indonesia, Aug. 2023, pp. 230–236.
A. ALAzzawi, "SQL injection detection using RNN deep learning model," Journal of Applied Engineering and Technological Science (JAETS), vol. 5, no. 1, pp. 531–541, 2023.
A. A. Ashlam, A. Badii, and F. Stahl, "A Novel Approach Exploiting Machine Learning to Detect SQLi Attacks," in 2022 5th International Conference on Advanced Systems and Emergent Technologies (IC_ASET), Hammamet, Tunisia, Mar. 2022, pp. 513–517.
A. T. Azar, S. U. Amin, M. A. Majeed, A. Al-Khayyat, and I. Kasim, "Cloud-Cyber Physical Systems: Enhanced Metaheuristics with Hierarchical Deep Learning-based Cyberattack Detection," Engineering, Technology & Applied Science Research, vol. 14, no. 6, pp. 17572–17583, Dec. 2024.
Z. Feng et al., "CodeBERT: A Pre-Trained Model for Programming and Natural Languages." arXiv, Sep. 18, 2020.
"SQL Injection Dataset." Kaggle, [Online]. Available: https://www.kaggle.com/datasets/sajid576/sql-injection-dataset.
Z. S. Rubaidi, B. B. Ammar, and M. B. Aouicha, "Comparative Data Oversampling Techniques with Deep Learning Algorithms for Credit Card Fraud Detection," in Intelligent Systems Design and Applications, 2023, pp. 286–296.
Z. S. Rubaidi, B. B. Ammar, and M. B. Aouicha, "Handling Imbalance Functional and Non-Functional Software Requirement Classification Based on Machine Learning Algorithms," in Hybrid Intelligent Systems, 2025, pp. 199–209.
"boulbaba1981/SQLi-Detector." Jul. 27, 2025, [Online]. Available: https://github.com/boulbaba1981/SQLi-Detector.
Downloads
How to Cite
License
Copyright (c) 2025 Boulbaba Ben Ammar, Ameni M. Alharbi

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain the copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) after its publication in ETASR with an acknowledgement of its initial publication in this journal.