Improving Web Security through Machine Learning: A Feature-Based Methodology for Detecting Phishing URLs

Reem Alzubi; Tariq Bishtawi; Hassan Kassem

doi:10.48084/etasr.12015

Authors

Reem Alzubi Engineering and Artificial Intelligence Department, Al-Salt Technical College, Al-Balqa Applied University, Al-Salt, Jordan
Tariq Bishtawi Department of Computer Science, Amman Arab University, Amman 11953, Jordan
Hassan Kassem Department of Communications and Computer Networks, Arab University College of Technology, Amman, Jordan

Volume: 15 | Issue: 5 | Pages: 26845-26851 | October 2025 | https://doi.org/10.48084/etasr.12015

Received: 8 May 2025 | Revised: 28 June 2025 and 6 July 2025 | Accepted: 8 July 2025 | Online: 4 August 2025

Corresponding author: Tariq Bishtawi

Abstract

Phishing attacks remain a significant and evolving threat to web security, often using malicious URLs to deceive users into sharing personal information. This study employs a detailed, feature-based approach to develop a machine learning method for detecting phishing URLs. The analysis includes four advanced machine learning classifiers that utilize comprehensive features from lexical patterns, host-based, and content-based URL characteristics. These classifiers are Random Forest (RF), Decision Tree (DT), Support Vector Machine (SVM) with a Radial Basis Function (RBF) kernel, and Extreme Gradient Boosting (XGBoost). Results demonstrate that ensemble methods outperform individual models in phishing detection, with XGB and RF achieving higher accuracy, precision, and recall across all metrics. These findings contribute to the development of real-time phishing detection tools, although effective feature engineering and model selection remain crucial for enhancing internet security.

Keywords:

phishing URLs, RF, DT, SVM, extreme gradient boosting, phishing detection

Downloads

Download data is not yet available.

References

M. A. Taha, H. D. A. Jabar, and W. K. Mohammed, "A Machine Learning Algorithms for Detecting Phishing Websites: A Comparative Study," Iraqi Journal for Computer Science and Mathematics, vol. 5, no. 3, Jan. 2024, Art. no. 13.

A. Bhavsar et al., "Enhanced Phishing Website Detection: Leveraging Random Forest and XGBoost Algorithms with Hybrid Features," International Journal of Innovative Science and Research Technology, vol. 8, no. 7, pp. 615-618, Jul. 2023.

V.C. Kalyan, B.V.V. Satyanarayana, A.V.V. Laxman, A.V.S. Amarnath, and G. Hariharan, "Improving online safety with machine learning-based phishing detection," International Journal of Progressive Research in Engineering Management and Science (IJPREMS), vol. 5, no. 4, pp. 1582–1587, Apr. 2025.

M. Salem Alzboon, M. Subhi Al-Batah, M. Alqaraleh, F. Alzboon, and L. Alzboon, "Guardians of the Web: Harnessing Machine Learning to Combat Phishing Attacks," Gamification and Augmented Reality, vol. 3, Jan. 2025, Art. no. 91.

M. S. Islam, Mst. N. J. Jyoti, Md. S. Mia, and M. G. Hussain, "Fake Website Detection Using Machine Learning Algorithms," in 2023 International Conference on Digital Applications, Transformation & Economy (ICDATE), July 2023, pp. 255–259.

A. Mishra and Fancy, "Efficient Detection of Phising Hyperlinks using Machine Learning," International Journal on Cybernetics & Informatics, vol. 10, no. 2, pp. 23–33, May 2021.

M. R. Islam, M. M. Islam, M. S. Afrin, A. Antara, N. Tabassum, and A. Amin, "PhishGuard: A Convolutional Neural Network Based Model for Detecting Phishing URLs with Explainability Analysis." arXiv, Apr. 27, 2024.

M. Elsadig et al., "Intelligent Deep Machine Learning Cyber Phishing URL Detection Based on BERT Features Extraction," Electronics, vol. 11, no. 22, Jan. 2022, Art. no. 3647.

A. Fajar, S. Yazid, and I. Budi, "Enhancing Phishing Detection through Feature Importance Analysis and Explainable AI: A Comparative Study of CatBoost, XGBoost, and EBM Models." arXiv, Nov. 11, 2024.

S. Garg and S.S.M. Imran, "Recognition of malicious URLs using machine learning," Indian Scientific Journal of Research in Engineering and Management, vol. 8, no. 8, pp. 1–4, Aug. 2024.

State of the Phish 2023 – France Report. Proofpoint, 2023.

"Web page Phishing Detection Dataset." https://www.kaggle.com/datasets/shashwatwork/web-page-phishing-detection-dataset.

"Phishing URL EDA and modelling." https://kaggle.com/code/akashkr/phishing-url-eda-and-modelling.

O. K. Sahingoz, E. Buber, O. Demir, and B. Diri, "Machine learning based phishing detection from URLs," Expert Systems with Applications, vol. 117, pp. 345–357, Mar. 2019.

M. Dewis and T. Viana, "Phish Responder: A Hybrid Machine Learning Approach to Detect Phishing and Spam Emails," Applied System Innovation, vol. 5, no. 4, Aug. 2022, Art. no. 73.

E. Y. Boateng and D. A. Abaye, "A Review of the Logistic Regression Model with Emphasis on Medical Research," Journal of Data Analysis and Information Processing, vol. 7, no. 4, pp. 190–207, Sept. 2019.

V. M. Yazhmozhi, B. Janet, and S. Reddy, "Anti-phishing System using LSTM and CNN," in 2020 IEEE International Conference for Innovation in Technology (INOCON), Aug. 2020, pp. 1–5.

A. I. Adler and A. Painsky, "Feature Importance in Gradient Boosting Trees with Cross-Validation Feature Selection," Entropy, vol. 24, no. 5, May 2022, Art. no. 687.

M. S. K. Swaroop, K. R. Chowdary and S. Kavishree, "Phishing websites detection using machine learning," International Journal of Recent Technology and Engineering, vol. 8, no. 4, pp. 1470–1474, Apr. 2021.

P. A. Bhavani, M. Chalamala, P. S. Likhitha, and C. P. S. Sai, "Phishing Websites Detection Using Machine Learning," Sept. 2022.

A. A. Albishri and M. M. Dessouky, "A Comparative Analysis of Machine Learning Techniques for URL Phishing Detection," Engineering, Technology & Applied Science Research, vol. 14, no. 6, pp. 18495–18501, Dec. 2024.

D. K. Singh and M. Shrivastava, "Evolutionary Algorithm-based Feature Selection for an Intrusion Detection System," Engineering, Technology & Applied Science Research, vol. 11, no. 3, pp. 7130–7134, June 2021.

Improving Web Security through Machine Learning: A Feature-Based Methodology for Detecting Phishing URLs

Authors

Abstract

Keywords:

Downloads

References

Downloads

How to Cite

Metrics

License