Improving Web Security through Machine Learning: A Feature-Based Methodology for Detecting Phishing URLs
Received: 8 May 2025 | Revised: 28 June 2025 and 6 July 2025 | Accepted: 8 July 2025 | Online: 4 August 2025
Corresponding author: Tariq Bishtawi
Abstract
Phishing attacks remain a significant and evolving threat to web security, often using malicious URLs to deceive users into sharing personal information. This study employs a detailed, feature-based approach to develop a machine learning method for detecting phishing URLs. The analysis includes four advanced machine learning classifiers that utilize comprehensive features from lexical patterns, host-based, and content-based URL characteristics. These classifiers are Random Forest (RF), Decision Tree (DT), Support Vector Machine (SVM) with a Radial Basis Function (RBF) kernel, and Extreme Gradient Boosting (XGBoost). Results demonstrate that ensemble methods outperform individual models in phishing detection, with XGB and RF achieving higher accuracy, precision, and recall across all metrics. These findings contribute to the development of real-time phishing detection tools, although effective feature engineering and model selection remain crucial for enhancing internet security.
Keywords:
phishing URLs, RF, DT, SVM, extreme gradient boosting, phishing detectionDownloads
References
M. A. Taha, H. D. A. Jabar, and W. K. Mohammed, "A Machine Learning Algorithms for Detecting Phishing Websites: A Comparative Study," Iraqi Journal for Computer Science and Mathematics, vol. 5, no. 3, Jan. 2024, Art. no. 13.
A. Bhavsar et al., "Enhanced Phishing Website Detection: Leveraging Random Forest and XGBoost Algorithms with Hybrid Features," International Journal of Innovative Science and Research Technology, vol. 8, no. 7, pp. 615-618, Jul. 2023.
V.C. Kalyan, B.V.V. Satyanarayana, A.V.V. Laxman, A.V.S. Amarnath, and G. Hariharan, "Improving online safety with machine learning-based phishing detection," International Journal of Progressive Research in Engineering Management and Science (IJPREMS), vol. 5, no. 4, pp. 1582–1587, Apr. 2025.
M. Salem Alzboon, M. Subhi Al-Batah, M. Alqaraleh, F. Alzboon, and L. Alzboon, "Guardians of the Web: Harnessing Machine Learning to Combat Phishing Attacks," Gamification and Augmented Reality, vol. 3, Jan. 2025, Art. no. 91.
M. S. Islam, Mst. N. J. Jyoti, Md. S. Mia, and M. G. Hussain, "Fake Website Detection Using Machine Learning Algorithms," in 2023 International Conference on Digital Applications, Transformation & Economy (ICDATE), July 2023, pp. 255–259.
A. Mishra and Fancy, "Efficient Detection of Phising Hyperlinks using Machine Learning," International Journal on Cybernetics & Informatics, vol. 10, no. 2, pp. 23–33, May 2021.
M. R. Islam, M. M. Islam, M. S. Afrin, A. Antara, N. Tabassum, and A. Amin, "PhishGuard: A Convolutional Neural Network Based Model for Detecting Phishing URLs with Explainability Analysis." arXiv, Apr. 27, 2024.
M. Elsadig et al., "Intelligent Deep Machine Learning Cyber Phishing URL Detection Based on BERT Features Extraction," Electronics, vol. 11, no. 22, Jan. 2022, Art. no. 3647.
A. Fajar, S. Yazid, and I. Budi, "Enhancing Phishing Detection through Feature Importance Analysis and Explainable AI: A Comparative Study of CatBoost, XGBoost, and EBM Models." arXiv, Nov. 11, 2024.
S. Garg and S.S.M. Imran, "Recognition of malicious URLs using machine learning," Indian Scientific Journal of Research in Engineering and Management, vol. 8, no. 8, pp. 1–4, Aug. 2024.
State of the Phish 2023 – France Report. Proofpoint, 2023.
"Web page Phishing Detection Dataset." https://www.kaggle.com/datasets/shashwatwork/web-page-phishing-detection-dataset.
"Phishing URL EDA and modelling." https://kaggle.com/code/akashkr/phishing-url-eda-and-modelling.
O. K. Sahingoz, E. Buber, O. Demir, and B. Diri, "Machine learning based phishing detection from URLs," Expert Systems with Applications, vol. 117, pp. 345–357, Mar. 2019.
M. Dewis and T. Viana, "Phish Responder: A Hybrid Machine Learning Approach to Detect Phishing and Spam Emails," Applied System Innovation, vol. 5, no. 4, Aug. 2022, Art. no. 73.
E. Y. Boateng and D. A. Abaye, "A Review of the Logistic Regression Model with Emphasis on Medical Research," Journal of Data Analysis and Information Processing, vol. 7, no. 4, pp. 190–207, Sept. 2019.
V. M. Yazhmozhi, B. Janet, and S. Reddy, "Anti-phishing System using LSTM and CNN," in 2020 IEEE International Conference for Innovation in Technology (INOCON), Aug. 2020, pp. 1–5.
A. I. Adler and A. Painsky, "Feature Importance in Gradient Boosting Trees with Cross-Validation Feature Selection," Entropy, vol. 24, no. 5, May 2022, Art. no. 687.
M. S. K. Swaroop, K. R. Chowdary and S. Kavishree, "Phishing websites detection using machine learning," International Journal of Recent Technology and Engineering, vol. 8, no. 4, pp. 1470–1474, Apr. 2021.
P. A. Bhavani, M. Chalamala, P. S. Likhitha, and C. P. S. Sai, "Phishing Websites Detection Using Machine Learning," Sept. 2022.
A. A. Albishri and M. M. Dessouky, "A Comparative Analysis of Machine Learning Techniques for URL Phishing Detection," Engineering, Technology & Applied Science Research, vol. 14, no. 6, pp. 18495–18501, Dec. 2024.
D. K. Singh and M. Shrivastava, "Evolutionary Algorithm-based Feature Selection for an Intrusion Detection System," Engineering, Technology & Applied Science Research, vol. 11, no. 3, pp. 7130–7134, June 2021.
Downloads
How to Cite
License
Copyright (c) 2025 Reem Alzubi, Tariq Bishtawi, Hassan Kassem

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain the copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) after its publication in ETASR with an acknowledgement of its initial publication in this journal.