EmotionNet: A Novel Hybrid Deep Learning Model for Arabic Speech Emotion Recognition
Received: 11 May 2025 | Revised: 4 June 2025 and 21 June 2025 | Accepted: 5 July 2025 | Online: 1 August 2025
Corresponding author: Mourad Belhadj
Abstract
This study presents EmotionNet, a novel hybrid deep learning model designed for Arabic speech emotion recognition. EmotionNet integrates a Variational Auto-Encoder (VAE) for latent representation learning with a lightweight classification branch enhanced by latent-space refinement. Evaluated on the KEDAS dataset, which includes five emotionally acted categories, the model achieved a test accuracy of 93.99% and outperformed conventional classifiers such as SVM, MLP, and Random Forest. The proposed approach employs a compound loss function and KL annealing to jointly optimize reconstruction and classification. Although the results are promising, the acted nature of KEDAS may overstate real-world performance, highlighting the need for evaluation on spontaneous, multimodal datasets, an effort currently underway in an ongoing interdisciplinary project.
Keywords:
Arabic speech emotion, variational autoencoder, KEDAS dataset, lld featuresDownloads
References
M. Belhadj, I. Bendellali, and E. Lakhdari, "KEDAS: A validated Arabic Speech Emotion Dataset," in 2022 International Symposium on iNnovative Informatics of Biskra (ISNIB), Biskra, Algeria, Dec. 2022, pp. 1–6.
M. Belhadj, I. Bendellali, and E. Lakhdari, "Kasdi-Merbah (University) Emotional Database in Arabic Speech." Linguistic Data Consortium, 2023.
L. Iben Nasr, A. Masmoudi, and L. Hadrich Belguith, "Survey on Arabic speech emotion recognition," International Journal of Speech Technology, vol. 27, no. 1, pp. 53–68, Mar. 2024.
H. Alamri and H. S. Alshanbari, "Emotion Recognition in Arabic Speech from Saudi Dialect Corpus Using Machine Learning and Deep Learning Algorithms," International Journal of Computer Science and Network Security, vol. 23, no. 8, pp. 9–16, Aug. 2023.
W. Ismaiel, A. Alhalangy, A. O. Y. Mohamed, and A. I. A. Musa, "Deep Learning, Ensemble and Supervised Machine Learning for Arabic Speech Emotion Recognition," Engineering, Technology & Applied Science Research, vol. 14, no. 2, pp. 13757–13764, Apr. 2024.
W. Bouchelligua, R. Al-Dayil, and A. Algaith, "Effective Data Augmentation Techniques for Arabic Speech Emotion Recognition Using Convolutional Neural Networks," Applied Sciences, vol. 15, no. 4, Jan. 2025, Art. no. 2114.
Y. Xiao, Y. Bo, and Z. Zheng, "Speech Emotion Recognition based on Semi-Supervised Adversarial Variational Autoencoder," in 2023 IEEE 10th International Conference on Cyber Security and Cloud Computing (CSCloud)/2023 IEEE 9th International Conference on Edge Computing and Scalable Cloud (EdgeCom), Xiangtan, Hunan, China, Jul. 2023, pp. 275–280.
A. V. Porco and D. Kang, "Enhancing Emotion Classification Through Speech and Correlated Emotional Sounds via a Variational Auto-Encoder Model with Prosodic Regularization," in 2023 IEEE International Conference on Computer Vision and Machine Intelligence (CVMI), Gwalior, India, Dec. 2023, pp. 1–6.
S. Sadok, S. Leglaive, L. Girin, X. Alameda-Pineda, and R. Séguier, "A multimodal dynamical variational autoencoder for audiovisual speech representation learning," Neural Networks, vol. 172, Apr. 2024, Art. no. 106120.
S. Latif, R. Rana, S. Khalifa, R. Jurdak, J. Qadir, and B. Schuller, "Survey of Deep Representation Learning for Speech Emotion Recognition," IEEE Transactions on Affective Computing, vol. 14, no. 2, pp. 1634–1654, Apr. 2023.
F. Eyben, M. Wöllmer, and B. Schuller, "Opensmile: the munich versatile and fast open-source audio feature extractor," in Proceedings of the 18th ACM international conference on Multimedia, Firenze, Italy, Oct. 2010, pp. 1459–1462.
M. S. Likitha, S. R. R. Gupta, K. Hasitha, and A. U. Raju, "Speech based human emotion recognition using MFCC," in 2017 International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET), Chennai, India, Mar. 2017, pp. 2257–2260.
S. Patnaik, "Speech emotion recognition by using complex MFCC and deep sequential model," Multimedia Tools and Applications, vol. 82, no. 8, pp. 11897–11922, Mar. 2023.
Y. Chen, J. Liu, L. Peng, Y. Wu, Y. Xu, and Z. Zhang, "Auto-Encoding Variational Bayes," Cambridge Explorations in Arts and Sciences, vol. 2, no. 1, Feb. 2024.
J. M. Wu and P. H. Hsu, "Annealed Kullback–Leibler divergence minimization for generalized TSP, spot identification and gene sorting," Neurocomputing, vol. 74, no. 12–13, pp. 2228–2240, Jun. 2011.
S. Wu, T. H. Falk, and W. Y. Chan, "Automatic speech emotion recognition using modulation spectral features," Speech Communication, vol. 53, no. 5, pp. 768–785, May 2011.
S. R. Bowman, L. Vilnis, O. Vinyals, A. Dai, R. Jozefowicz, and S. Bengio, "Generating Sentences from a Continuous Space," in Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning, Berlin, Germany, 201.
D. P. Kingma and J. Ba, "Adam: A Method for Stochastic Optimization." arXiv, Jan. 30, 2017.
X. Wang, "Toward Domain Adaptive Learning-Based Variation Autoencoder Emotional Analysis in English Teaching," International Journal on Artificial Intelligence Tools, vol. 33, no. 07, Nov. 2024.
Downloads
How to Cite
License
Copyright (c) 2025 Mourad Belhadj, Mihoub Mazouz, Dalal Djeridi

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain the copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) after its publication in ETASR with an acknowledgement of its initial publication in this journal.