A Hybrid Heuristic-Machine Learning Framework for Phishing Detection Using Multi-Domain Feature Analysis

Ashvini Jadhav; Pankaj Chandre

doi:10.48084/etasr.11548

Authors

Ashvini Jadhav Department of Computer Science and Engineering, MIT School of Computing, MIT Art Design and Technology University, Pune, India
Pankaj Chandre Department of Computer Science and Engineering, MIT School of Computing, MIT Art Design and Technology University, Pune, India

Volume: 15 | Issue: 5 | Pages: 27219-27226 | October 2025 | https://doi.org/10.48084/etasr.11548

Received: 16 April 2025 | Revised: 29 May 2025 and 4 June 2025 | Accepted: 7 June 2025 | Online: 6 October 2025

Corresponding author: Ashvini Jadhav

Abstract

This study introduces a hybrid phishing detection framework that combines machine learning with heuristic rule-based techniques to provide accurate, scalable, and policy-compliant detection across a variety of phishing types. The proposed method uses diverse datasets, including URL patterns, email headers, and HTML content, organized in a layered manner, allowing flexible analysis even when some features are missing. Feature selection techniques, such as variance thresholding and Recursive Feature Elimination (RFE), are applied to improve learning efficiency and reduce noise. Several classifiers, including Random Forest (RF), XGBoost, Gradient Boosting (GB), and CatBoost, are trained on optimized features, and their outputs are combined using voting to boost overall reliability. The system also includes a rule-based engine aligned with India's national Email Policy, incorporating heuristic checks such as non-government domains, missing authentication (SPF/DKIM/DMARC), use of insecure protocols, foreign IPs, phishing URLs, and other threat indicators. Each rule is weighted and contributes to a composite suspicion score, which is explainable and policy-mapped. These heuristic signals are used both directly and as features for the machine learning models, allowing for layered, interpretable AI. The final phishing score balances the contribution of both heuristic and ML predictions and is compared against an optimized threshold to determine whether an input is phishing or safe. Experimental results on benchmark datasets demonstrate that heuristic-guided feature selection, combined with hybrid data integration, significantly improves performance, achieving an average accuracy exceeding 95% in real-world datasets. Individual models, including CatBoost and XGBoost, demonstrated outstanding performance, achieving training accuracies of up to 100% and testing accuracies of 96.7% and 96.4%, respectively, for URL datasets. For email header analysis, RF achieved the highest accuracy at 99.85%. The findings underscore the significance of feature engineering in developing scalable and reliable phishing detection systems.

Keywords:

machine learning, feature analysis, random forest, phishing detection

Downloads

Download data is not yet available.

References

R. Tanti, "Study of Phishing Attack and their Prevention Techniques," International Journal of Scientific Research in Engineering and Management, vol. 08, no. 10, pp. 1–8, Oct. 2024.

G. H. Al-Rawashdeh, O. A. Khashan, J. Al-Rawashde, J. A. Al-Gasawneh, A. Alsokkar, and M. Alshinwa, "Feature Selection Using Hybrid Metaheuristic Algorithm for Email Spam Detection," Cybernetics and Information Technologies, vol. 24, no. 2, pp. 156–171, Jun. 2024.

G. Pei et al., "Deepfake Generation and Detection: A Benchmark and Survey." arXiv, May 16, 2024.

Y. Li et al., "KnowPhish: Large Language Models Meet Multimodal Knowledge Graphs for Enhancing {Reference-Based} Phishing Detection," presented at the 33rd USENIX Security Symposium (USENIX Security 24), 2024, pp. 793–810.

R. J. Van Geest, G. Cascavilla, J. Hulstijn, and N. Zannone, "The applicability of a hybrid framework for automated phishing detection," Computers & Security, vol. 139, Apr. 2024, Art. no. 103736.

J. Gikandi, J. Kamau, D. Njuguna, and L. Sawe, "Sentence Level Analysis Model for Phishing Detection Using KNN," Journal of Cyber Security, vol. 6, no. 1, pp. 25–39, 2024.

S. Das Guptta, K. T. Shahriar, H. Alqahtani, D. Alsalman, and I. H. Sarker, "Modeling Hybrid Feature-Based Phishing Websites Detection Using Machine Learning Techniques," Annals of Data Science, vol. 11, no. 1, pp. 217–242, Feb. 2024.

P. Maturure, A. Ali, and A. Gegov, "Hybrid Machine Learning Model for Phishing Detection," in 2024 IEEE 12th International Conference on Intelligent Systems (IS), Varna, Bulgaria, Aug. 2024, pp. 1–7.

L. Tang and Q. H. Mahmoud, "A Survey of Machine Learning-Based Solutions for Phishing Website Detection," Machine Learning and Knowledge Extraction, vol. 3, no. 3, pp. 672–694, Sep. 2021.

C. Opara, Y. Chen, and B. Wei, "Look before you leap: Detecting phishing web pages by exploiting raw URL and HTML characteristics," Expert Systems with Applications, vol. 236, Feb. 2024, Art. no. 121183.

A. Yasin, R. Fatima, Z. JiangBin, W. Afzal, and S. Raza, "Can serious gaming tactics bolster spear-phishing and phishing resilience? : Securing the human hacking in Information Security," Information and Software Technology, vol. 170, Jun. 2024, Art. no. 107426.

B. Montaruli, L. Demetrio, M. Pintor, L. Compagna, D. Balzarotti, and B. Biggio, "Raze to the Ground: Query-Efficient Adversarial HTML Attacks on Machine-Learning Phishing Webpage Detectors," in Proceedings of the 16th ACM Workshop on Artificial Intelligence and Security, Copenhagen, Denmark, Nov. 2023, pp. 233–244.

A. Safi and S. Singh, "A systematic literature review on phishing website detection techniques," Journal of King Saud University - Computer and Information Sciences, vol. 35, no. 2, pp. 590–611, Feb. 2023.

U. A. Butt, R. Amin, H. Aldabbas, S. Mohan, B. Alouffi, and A. Ahmadian, "Cloud-based email phishing attack using machine and deep learning algorithm," Complex & Intelligent Systems, vol. 9, no. 3, pp. 3043–3070, Jun. 2023.

N. Beu et al., "Falling for phishing attempts: An investigation of individual differences that are associated with behavior in a naturalistic phishing simulation," Computers & Security, vol. 131, Aug. 2023, Art. no. 103313.

A. Jayatilaka, N. A. G. Arachchilage, and M. A. Babar, "Why People Still Fall for Phishing Emails: An Empirical Investigation into How Users Make Email Response Decisions." arXiv, Jan. 24, 2024.

B. Naqvi, K. Perova, A. Farooq, I. Makhdoom, S. Oyedeji, and J. Porras, "Mitigation strategies against the phishing attacks: A systematic literature review," Computers & Security, vol. 132, Sep. 2023, Art. no. 103387.

R. Hoheisel, G. Van Capelleveen, D. K. Sarmah, and M. Junger, "The development of phishing during the COVID-19 pandemic: An analysis of over 1100 targeted domains," Computers & Security, vol. 128, May 2023, Art. no. 103158.

A. Wu, Z. Feng, X. Li, and J. Xiao, "ZTWeb: Cross site scripting detection based on zero trust," Computers & Security, vol. 134, Nov. 2023, Art. no. 103434.

R. Brindha, S. Nandagopal, H. Azath, V. Sathana, G. Prasad Joshi, and S. Won Kim, "Intelligent Deep Learning Based Cybersecurity Phishing Email Detection and Classification," Computers, Materials & Continua, vol. 74, no. 3, pp. 5901–5914, 2023.

S. Kuraku and D. Kalla, "Impact of phishing on users with different online browsing hours and spending habits," International Journal of Advanced Research in Computer and Communication Engineering, vol. 12, no. 10, Oct. 2023.

R. Zieni, L. Massari, and M. C. Calzarossa, "Phishing or Not Phishing? A Survey on the Detection of Phishing Websites," IEEE Access, vol. 11, pp. 18499–18519, 2023.

Q. Qi, Z. Wang, Y. Xu, Y. Fang, and C. Wang, "Enhancing Phishing Email Detection through Ensemble Learning and Undersampling," Applied Sciences, vol. 13, no. 15, Jul. 2023, Art. no. 8756.

S. Zhuo et al., "What You See is Not What You Get: The Role of Email Presentation in Phishing Susceptibility." arXiv, Apr. 03, 2023.

M. H. Alsuwit, M. A. Haq, and M. A. Aleisa, "Advancing Email Spam Classification using Machine Learning and Deep Learning Techniques," Engineering, Technology & Applied Science Research, vol. 14, no. 4, pp. 14994–15001, Aug. 2024.

A. Alzahrani, "Explainable AI-based Framework for Efficient Detection of Spam from Text using an Enhanced Ensemble Technique," Engineering, Technology & Applied Science Research, vol. 14, no. 4, pp. 15596–15601, Aug. 2024.

A. Jadhav and P. R. Chandre, "Survey and comparative analysis of phishing detection techniques: current trends, challenges, and future directions," IAES International Journal of Artificial Intelligence (IJ-AI), vol. 14, no. 2, Apr. 2025, Art. no. 853.

A. Prasad and S. Chandra, "PhiUSIIL: A diverse security profile empowered phishing URL detection framework based on similarity index and incremental learning," Computers & Security, vol. 136, Jan. 2024, Art. no. 103545.

C. L. Tan, "Phishing Dataset for Machine Learning: Feature Evaluation." Mendeley, Mar. 24, 2018.

S. Marchal, J. François, R. State, and T. Engel, "PhishStorm: Detecting Phishing With Streaming Analytics," IEEE Transactions on Network and Service Management, vol. 11, no. 4, pp. 458–471, Sep. 2014.

A. Hannousse, "Web page phishing detection." Mendeley, Jun. 25, 2021.

A Hybrid Heuristic-Machine Learning Framework for Phishing Detection Using Multi-Domain Feature Analysis

Authors

Abstract

Keywords:

Downloads

References

Downloads

How to Cite

Metrics

License