Analysis of Text Feature Extractors using Deep Learning on Fake News

B. Ahmed; G. Ali; A. Hussain; A. Baseer; J. Ahmed

doi:10.48084/etasr.4069

Authors

B. Ahmed Department of Electrical Engineering, Sukkur IBA University, Pakistan
G. Ali Department of Electrical Engineering, Sukkur IBA University, Pakistan
A. Hussain Department of Electrical Engineering, Sukkur IBA University, Pakistan
A. Baseer Department of Electrical Engineering, Sukkur IBA University, Pakistan
J. Ahmed Department of Electrical Engineering, Sukkur IBA University, Pakistan

Volume: 11 | Issue: 2 | Pages: 7001-7005 | April 2021 | https://doi.org/10.48084/etasr.4069

Received: 2 February 2021 | Revised: 16 February 2021 | Accepted: 21 February 2021 | Online: 11 April 2021

Corresponding author: B. Ahmed

Abstract

Social media and easy internet access have allowed the instant sharing of news, ideas, and information on a global scale. However, rapid spread and instant access to information/news can also enable rumors or fake news to spread very easily and rapidly. In order to monitor and minimize the spread of fake news in the digital community, fake news detection using Natural Language Processing (NLP) has attracted significant attention. In NLP, different text feature extractors and word embeddings are used to process the text data. The aim of this paper is to analyze the performance of a fake news detection model based on neural networks using 3 feature extractors: TD-IDF vectorizer, Glove embeddings, and BERT embeddings. For the evaluation, multiple metrics, namely accuracy, precision, F1, recall, AUC ROC, and AUC PR were computed for each feature extractor. All the transformation techniques were fed to the deep learning model. It was found that BERT embeddings for text transformation delivered the best performance. TD-IDF has been performed far better than Glove and competed the BERT as well at some stages.

Keywords:

fake news, natural language processing, feature extractors, deep learning

References

T. Lima-Quintanilha, M. Torres-da-Silva, and T. Lapa, "Fake news and its impact on trust in the news. Using the Portuguese case to establish lines of differentiation," Communication & Society, vol. 32, no. 3, pp. 17-32, Apr. 2019. https://doi.org/10.15581/003.32.3.17-32

R. Baly, G. Karadzhov, D. Alexandrov, J. Glass, and P. Nakov, "Predicting Factuality of Reporting and Bias of News Media Sources," in Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, Oct. 2018, pp. 3528-3539. https://doi.org/10.18653/v1/D18-1389

R. Zellers et al., "Defending Against Neural Fake News," arXiv:1905.12616 [cs], Dec. 2020, Accessed: Mar. 19, 2021. [Online]. Available: http://arxiv.org/abs/1905.12616.

H. Ahmed, I. Traore, and S. Saad, "Detection of Online Fake News Using N-Gram Analysis and Machine Learning Techniques," in Intelligent, Secure, and Dependable Systems in Distributed and Cloud Environments, Vancouver, Canada, Oct. 2017, pp. 127-138. https://doi.org/10.1007/978-3-319-69155-8_9

B. A. Asaad and M. Erascu, "A Tool for Fake News Detection," in 2018 20th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC), Timisoara, Romania, Sep. 2018, pp. 379-386. https://doi.org/10.1109/SYNASC.2018.00064

Abdullah-All-Tanvir, E. M. Mahir, S. Akhter, and M. R. Huq, "Detecting Fake News using Machine Learning and Deep Learning Algorithms," in 2019 7th International Conference on Smart Computing Communications (ICSCC), Sarawak, Malaysia, Jun. 2019, pp. 1-5. https://doi.org/10.1109/ICSCC.2019.8843612

S. Sangamnerkar, R. Srinivasan, M. R. Christhuraj, and R. Sukumaran, "An Ensemble Technique to Detect Fabricated News Article Using Machine Learning and Natural Language Processing Techniques," in 2020 International Conference for Emerging Technology (INCET), Belgaum, India, Jun. 2020. https://doi.org/10.1109/INCET49848.2020.9154053

D. Chopra, N. Joshi, and I. Mathur, "Improving Translation Quality By Using Ensemble Approach," Engineering, Technology & Applied Science Research, vol. 8, no. 6, pp. 3512-3514, Dec. 2018. https://doi.org/10.48084/etasr.2269

M. Biniz, S. Boukil, F. Adnani, L. Cherrat, and A. Moutaouakkil, "Arabic Text Classification Using Deep Learning Technics," International Journal of Grid and Distributed Computing, vol. 11, no. 9, pp. 103-114, Sep. 2018. https://doi.org/10.14257/ijgdc.2018.11.9.09

A. Hassan and A. Mahmood, "Efficient Deep Learning Model for Text Classification Based on Recurrent and Convolutional Layers," in 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA), Cancun, Mexico, Dec. 2017, pp. 1108-1113. https://doi.org/10.1109/ICMLA.2017.00009

J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding," arXiv:1810.04805 [cs], May 2019, Accessed: Mar. 19, 2021. [Online]. Available: http://arxiv.org/abs/1810.04805.

J. Pennington, R. Socher, and C. Manning, "GloVe: Global Vectors for Word Representation," in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, Oct. 2014, pp. 1532-1543. https://doi.org/10.3115/v1/D14-1162

F. Pedregosa et al., "Scikit-learn: Machine Learning in Python," The Journal of Machine Learning Research, vol. 12, pp. 2825-2830, Nov. 2011.

"Fake News: Balanced dataset for fake news analysis," Kaggle. https://kaggle.com/hassanamin/textdb3 (accessed Mar. 19, 2021).

"Fake news: Fake News Classifier Using Bidirectional LSTM," Kaggle. https://kaggle.com/saratchendra/fake-news (accessed Mar. 19, 2021).

H. Christian, M. P. Agus, and D. Suhartono, "Single Document Automatic Text Summarization using Term Frequency-Inverse Document Frequency (TF-IDF)," ComTech: Computer, Mathematics and Engineering Applications, vol. 7, no. 4, pp. 285-294, Dec. 2016. https://doi.org/10.21512/comtech.v7i4.3746

B. Trstenjak, S. Mikac, and D. Donko, "KNN with TF-IDF based Framework for Text Categorization," Procedia Engineering, vol. 69, pp. 1356-1364, Jan. 2014. https://doi.org/10.1016/j.proeng.2014.03.129

W. K. Sari, D. P. Rini, and R. F. Malik, "Text Classification Using Long Short-Term Memory With GloVe Features," Jurnal Ilmiah Teknik Elektro Komputer dan Informatika, vol. 5, no. 2, pp. 85-100, Dec. 2019. https://doi.org/10.26555/jiteki.v5i2.15021

U. Khan, K. Khan, F. Hassan, A. Siddiqui, and M. Afaq, "Towards Achieving Machine Comprehension Using Deep Learning on Non-GPU Machines," Engineering, Technology & Applied Science Research, vol. 9, no. 4, pp. 4423-4427, Aug. 2019. https://doi.org/10.48084/etasr.2734

T. Wolf et al., "Transformers: State-of-the-Art Natural Language Processing," in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Online, Oct. 2020, pp. 38-45.

M. Zaheer et al., "Big Bird: Transformers for Longer Sequences," arXiv:2007.14062 [cs, stat], Jan. 2021, Accessed: Mar. 19, 2021. [Online]. Available: http://arxiv.org/abs/2007.14062.

T. Mikolov, K. Chen, G. Corrado, and J. Dean, "Efficient Estimation of Word Representations in Vector Space," arXiv:1301.3781 [cs], Sep. 2013, Accessed: Mar. 19, 2021. [Online]. Available: http://arxiv.org/abs/1301.3781.

S. Liu, H. Tao, and S. Feng, "Text Classification Research Based on Bert Model and Bayesian Network," in 2019 Chinese Automation Congress (CAC), Hangzhou, China, Nov. 2019, pp. 5842-5846. https://doi.org/10.1109/CAC48633.2019.8996183

A. Hussain, G. Ali, F. Akhtar, Z. H. Khand, and A. Ali, "Design and Analysis of News Category Predictor," Engineering, Technology & Applied Science Research, vol. 10, no. 5, pp. 6380-6385, Oct. 2020. https://doi.org/10.48084/etasr.3825