EmotionNet: A Novel Hybrid Deep Learning Model for Arabic Speech Emotion Recognition

Authors

  • Mourad Belhadj LINATI, University of Ouargla, Algeria | LESIA, University of Biskra, Algeria
  • Mihoub Mazouz LINATI, University of Ouargla, Algeria
  • Dalal Djeridi LINATI, University of Ouargla, Algeria
Volume: 15 | Issue: 5 | Pages: 26619-26625 | October 2025 | https://doi.org/10.48084/etasr.12035

Abstract

This study presents EmotionNet, a novel hybrid deep learning model designed for Arabic speech emotion recognition. EmotionNet integrates a Variational Auto-Encoder (VAE) for latent representation learning with a lightweight classification branch enhanced by latent-space refinement. Evaluated on the KEDAS dataset, which includes five emotionally acted categories, the model achieved a test accuracy of 93.99% and outperformed conventional classifiers such as SVM, MLP, and Random Forest. The proposed approach employs a compound loss function and KL annealing to jointly optimize reconstruction and classification. Although the results are promising, the acted nature of KEDAS may overstate real-world performance, highlighting the need for evaluation on spontaneous, multimodal datasets, an effort currently underway in an ongoing interdisciplinary project.

 

 

Keywords:

Arabic speech emotion, variational autoencoder, KEDAS dataset, lld features

Downloads

Download data is not yet available.

References

M. Belhadj, I. Bendellali, and E. Lakhdari, "KEDAS: A validated Arabic Speech Emotion Dataset," in 2022 International Symposium on iNnovative Informatics of Biskra (ISNIB), Biskra, Algeria, Dec. 2022, pp. 1–6.

M. Belhadj, I. Bendellali, and E. Lakhdari, "Kasdi-Merbah (University) Emotional Database in Arabic Speech." Linguistic Data Consortium, 2023.

L. Iben Nasr, A. Masmoudi, and L. Hadrich Belguith, "Survey on Arabic speech emotion recognition," International Journal of Speech Technology, vol. 27, no. 1, pp. 53–68, Mar. 2024.

H. Alamri and H. S. Alshanbari, "Emotion Recognition in Arabic Speech from Saudi Dialect Corpus Using Machine Learning and Deep Learning Algorithms," International Journal of Computer Science and Network Security, vol. 23, no. 8, pp. 9–16, Aug. 2023.

W. Ismaiel, A. Alhalangy, A. O. Y. Mohamed, and A. I. A. Musa, "Deep Learning, Ensemble and Supervised Machine Learning for Arabic Speech Emotion Recognition," Engineering, Technology & Applied Science Research, vol. 14, no. 2, pp. 13757–13764, Apr. 2024.

W. Bouchelligua, R. Al-Dayil, and A. Algaith, "Effective Data Augmentation Techniques for Arabic Speech Emotion Recognition Using Convolutional Neural Networks," Applied Sciences, vol. 15, no. 4, Jan. 2025, Art. no. 2114.

Y. Xiao, Y. Bo, and Z. Zheng, "Speech Emotion Recognition based on Semi-Supervised Adversarial Variational Autoencoder," in 2023 IEEE 10th International Conference on Cyber Security and Cloud Computing (CSCloud)/2023 IEEE 9th International Conference on Edge Computing and Scalable Cloud (EdgeCom), Xiangtan, Hunan, China, Jul. 2023, pp. 275–280.

A. V. Porco and D. Kang, "Enhancing Emotion Classification Through Speech and Correlated Emotional Sounds via a Variational Auto-Encoder Model with Prosodic Regularization," in 2023 IEEE International Conference on Computer Vision and Machine Intelligence (CVMI), Gwalior, India, Dec. 2023, pp. 1–6.

S. Sadok, S. Leglaive, L. Girin, X. Alameda-Pineda, and R. Séguier, "A multimodal dynamical variational autoencoder for audiovisual speech representation learning," Neural Networks, vol. 172, Apr. 2024, Art. no. 106120.

S. Latif, R. Rana, S. Khalifa, R. Jurdak, J. Qadir, and B. Schuller, "Survey of Deep Representation Learning for Speech Emotion Recognition," IEEE Transactions on Affective Computing, vol. 14, no. 2, pp. 1634–1654, Apr. 2023.

F. Eyben, M. Wöllmer, and B. Schuller, "Opensmile: the munich versatile and fast open-source audio feature extractor," in Proceedings of the 18th ACM international conference on Multimedia, Firenze, Italy, Oct. 2010, pp. 1459–1462.

M. S. Likitha, S. R. R. Gupta, K. Hasitha, and A. U. Raju, "Speech based human emotion recognition using MFCC," in 2017 International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET), Chennai, India, Mar. 2017, pp. 2257–2260.

S. Patnaik, "Speech emotion recognition by using complex MFCC and deep sequential model," Multimedia Tools and Applications, vol. 82, no. 8, pp. 11897–11922, Mar. 2023.

Y. Chen, J. Liu, L. Peng, Y. Wu, Y. Xu, and Z. Zhang, "Auto-Encoding Variational Bayes," Cambridge Explorations in Arts and Sciences, vol. 2, no. 1, Feb. 2024.

J. M. Wu and P. H. Hsu, "Annealed Kullback–Leibler divergence minimization for generalized TSP, spot identification and gene sorting," Neurocomputing, vol. 74, no. 12–13, pp. 2228–2240, Jun. 2011.

S. Wu, T. H. Falk, and W. Y. Chan, "Automatic speech emotion recognition using modulation spectral features," Speech Communication, vol. 53, no. 5, pp. 768–785, May 2011.

S. R. Bowman, L. Vilnis, O. Vinyals, A. Dai, R. Jozefowicz, and S. Bengio, "Generating Sentences from a Continuous Space," in Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning, Berlin, Germany, 201.

D. P. Kingma and J. Ba, "Adam: A Method for Stochastic Optimization." arXiv, Jan. 30, 2017.

X. Wang, "Toward Domain Adaptive Learning-Based Variation Autoencoder Emotional Analysis in English Teaching," International Journal on Artificial Intelligence Tools, vol. 33, no. 07, Nov. 2024.

Downloads

How to Cite

[1]
M. Belhadj, M. Mazouz, and D. Djeridi, “EmotionNet: A Novel Hybrid Deep Learning Model for Arabic Speech Emotion Recognition”, Eng. Technol. Appl. Sci. Res., vol. 15, no. 5, pp. 26619–26625, Oct. 2025.

Metrics

Abstract Views: 139
PDF Downloads: 19

Metrics Information