Detection of Spam Email by Combining Harmony Search Algorithm and Decision Tree
Spam emails is probable the main problem faced by most e-mail users. There are many features in spam email detection and some of these features have little effect on detection and cause skew detection and classification of spam email. Thus, Feature Selection (FS) is one of the key topics in spam email detection systems. With choosing the important and effective features in classification, its performance can be optimized. Selector features has the task of finding a subset of features to improve the accuracy of its predictions. In this paper, a hybrid of Harmony Search Algorithm (HSA) and decision tree is used for selecting the best features and classification. The obtained results on Spam-base dataset show that the rate of recognition accuracy in the proposed model is 95.25% which is high in comparison with models such as SVM, NB, J48 and MLP. Also, the accuracy of the proposed model on the datasets of Ling-spam and PU1 is high in comparison with models such as NB, SVM and LR.
Keywords:Spam Email, Harmony Search Algorithm, Decision Tree
S. Liu, Y. Wang, J. Zhang, C. Chen, Y. Xiang, “Addressing the class imbalance problem in twitter spam detection using ensemble learning”, Computers & Security, 2016 (in press) DOI: https://doi.org/10.1016/j.cose.2016.12.004
A. Heydari, M.A. Tavakoli, N. Salim, Z. Heydari, “Detection of review spam: A survey”, Expert Systems with Applications, Vol. 42, No. 7, pp. 3634-3642, 2015 DOI: https://doi.org/10.1016/j.eswa.2014.12.029
T. Ouyang, S. Ray, M. Allman, M. Rabinovich, “A large-scale empirical analysis of email spam detection through network characteristics in a stand-alone enterprise”, Computer Networks, Vol. 59, pp. 101-121, 2014 DOI: https://doi.org/10.1016/j.comnet.2013.08.031
N. Perez-Diaz, D. Ruano-Ordas, J. R. Mendez, J. F. Galvez, F. Fdez-Riverola, “Rough sets for spam filtering: Selecting appropriate decision rules for boundary e-mail classification”, Applied Soft Computing, Vol. 12, No. 11, pp. 3671-3682, 2012 DOI: https://doi.org/10.1016/j.asoc.2012.05.024
Z. W. Geem, J. H. Kim, G. V. Loganathan, “A New Heuristic Optimization Algorithm: Harmony Search”, Simulation, Vol. 76, No. 2, pp. 60-68, 2001 DOI: https://doi.org/10.1177/003754970107600201
J. R. Quinlan, Induction of Decision Trees, Machine Learning, Vol. 1, No. 1, pp. 81-106, 1986 DOI: https://doi.org/10.1007/BF00116251
S. Ali, S. Ozawa, J. Nakazato, T. Ban, J. Shimamura, “An autonomous online malicious spam email detection system using extended RBF network”, 2015 IEEE International Joint Conference on Neural Networks (IJCNN), pp. 1-7, 2015 DOI: https://doi.org/10.1109/IJCNN.2015.7280826
S. Abu-Nimeh, D. Nappa, X. Wang, S. Nair, “Bayesian Additive Regression Trees-Based Spam Detection for Enhanced Email Privacy”, IEEE Third International Conference on Availability, Reliability and Security, pp. 1044-1051, 2008
S. Salehi, A. Selamat, M. Bostanian, “Enhanced genetic algorithm for spam detection in email”, IEEE 2nd International Conference on Software Engineering and Service Science, pp. 594-597, 2011 DOI: https://doi.org/10.1109/ICSESS.2011.5982390
M. Prilepok, T. Jezowicz, J. Platos, V. Snasel, “Spam detection using compression and PSO”, IEEE Fourth International Conference on Computational Aspects of Social Networks (CASoN), pp. 263-270, 2012 DOI: https://doi.org/10.1109/CASoN.2012.6412413
S. B. Rathod, T. M. Pattewar, “Content based spam detection in email using Bayesian classifier”, IEEE International Conference on Communications and Signal Processing (ICCSP), pp. 1257-1261, 2015 DOI: https://doi.org/10.1109/ICCSP.2015.7322709
M. Prilepok, M. Kudelka, “Spam Detection Based on Nearest Community Classifier”, IEEE International Conference on Intelligent Networking and Collaborative Systems, pp. 354-359, 2015 DOI: https://doi.org/10.1109/INCoS.2015.75
S. Salehi, A. Selamat, O. Krejcar, K. Kuca, “Fuzzy Granular Classifier Approach for Spam Detection”, Journal of Intelligent & Fuzzy Systems, vol. 32, no. 2, pp. 1355-1363, 2017 DOI: https://doi.org/10.3233/JIFS-169133
A. R. Behjat, A. Mustapha, H. Nezamabadipour, M. Nasir Sulaiman, N. Mustapha, “A PSO-Based Feature Subset Selection for Application of Spam/Non-spam Detection”, in Soft Computing Applications and Intelligent Systems, Communications in Computer and Information Science, Vol. 378, Springer, Berlin, Heidelberg, 2013 DOI: https://doi.org/10.1007/978-3-642-40567-9_16
R. S. Michalski, I. Bratko, M. Kubat, Machine Learning and Data Mining: Methods and Applications, New York: Wiley, 1998
D. Francois, Binary classification performances measure cheat sheet, 2009
I. Idris, A. Selamat, “Improved email spam detection model with negative selection algorithm and particle swarm optimization”, Applied Soft Computing, Vol. 22, pp. 11-27, 2014 DOI: https://doi.org/10.1016/j.asoc.2014.05.002
Y. Zhang, H. Y. Li, M. Niranjan, P. Rockett, “Applying cost-sensitive multiobjective genetic programming to feature extraction for spam e-mail filtering”, Lecture Notes in Computer Science, Genetic Programming, Berlin/Heidelberg, Springer, Vol. 4971, pp. 325-336, 2008 DOI: https://doi.org/10.1007/978-3-540-78671-9_28
T. Fagbola, S. Olabiyisi, A. Adigun, “Hybrid GA-SVM for efficient feature selection in e-mail classification”, Comput. Eng. Intell. Syst, Vol. 3, No. 3, pp. 17-28, 2012
A. K. Uysal, S. Gunal, “A novel probabilistic feature selection method for text classification”, Knowl. Based Syst., Vol. 36, pp. 226-235, 2012 DOI: https://doi.org/10.1016/j.knosys.2012.06.005
L. Ozgur, T. Gungor, F. Gurgen, “Spam mail detection using artificial neural network and bayesian filter:, in: Z. Yang, H. Yin, R. Everson (Eds.), Intelligent Data Engineering and Automated Learning- IDEAL 2004, Springer, Berlin/Heidelberg, 2004, pp. 505-510, 2004. DOI: https://doi.org/10.1007/978-3-540-28651-6_74
R. Ariaeinejad, A. Sadeghian, “Spam Detection System: A New Approach based on Interval Type-2 Fuzzy Sets”, 24th Canadian Conference on Electrical and Computer Engineering (CCECE, 2011), 2011 DOI: https://doi.org/10.1109/CCECE.2011.6030477
I. Idris, A. Selamat, N.T. Nguyen, S. Omatu, O. Krejcar, K. Kuca, M. Penhaker, “A Combined Negative Selection algorithm-Particle Swarm Optimization for an Email Spam Detection System”, Engineering Applications of Artificial Intelligence, Vol. 39, pp. 33-44, 2015 DOI: https://doi.org/10.1016/j.engappai.2014.11.001
S. Sharma, A. Arora, “Adaptive Approach for Spam Detection”, International Journal of Computer Science Issues, Vol. 10, No. 4, No 1, pp. 23-26, 2013
S. S. Shinde, R. Patil, “Improving Spam Mail Filtering using Classification Algorithms with Discretization Filter”, International Journal of Emerging Technologies in Computational and Applied Sciences, Vol. 10, No. 1, pp. 82-87, 2014.
M. Rathi, V. Pareek, “Spam Mail Detection through Data Mining-A Comparative Performance Analysis”, International Journal of Modern Education and Computer Science, Vol. 12, pp. 31-39, 2013 DOI: https://doi.org/10.5815/ijmecs.2013.12.05
S. Abu-Nimeh, D. Nappa, X. Wang, S. Nair, “Bayesian Additive Regression Trees-Based Spam Detection for Enhanced Email Privacy”, IEEE Third International Conference on Availability, Reliability and Security, pp. 1044-1051, 2008 DOI: https://doi.org/10.1109/ARES.2008.136
How to Cite
MetricsAbstract Views: 833
PDF Downloads: 410
Authors who publish with this journal agree to the following terms:
- Authors retain the copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) after its publication in ETASR with an acknowledgement of its initial publication in this journal.