Efficient Feature Extraction Algorithms to Develop an Arabic Speech Recognition System
Abstract
This paper studies three feature extraction methods, Mel-Frequency Cepstral Coefficients (MFCC), Power-Normalized Cepstral Coefficients (PNCC), and Modified Group Delay Function (ModGDF) for the development of an Automated Speech Recognition System (ASR) in Arabic. The Support Vector Machine (SVM) algorithm processed the obtained features. These feature extraction algorithms extract speech or voice characteristics and process the group delay functionality calculated straight from the voice signal. These algorithms were deployed to extract audio forms from Arabic speakers. PNCC provided the best recognition results in Arabic speech in comparison with the other methods. Simulation results showed that PNCC and ModGDF were more accurate than MFCC in Arabic speech recognition.
Keywords:
speech recognition, feature extraction, PNCC, ModGDF, MFCC, Arabic speech recognitionDownloads
References
P. P. Shrishrimal, R. R. Deshmukh, V. B. Waghmare, “Indian language speech database: A review”, International Journal of Computer Applications, Vol. 47, No. 5, pp. 17-21, 2012 DOI: https://doi.org/10.5120/7184-9893
S. K. Gaikwad, B. W. Gawali, P. Yannawar, “A review on speech recognition technique”, International Journal of Computer Applications, Vol. 10, No. 3, pp. 16-24, 2010 DOI: https://doi.org/10.5120/1462-1976
C. Huang, T. Chen, E. Chang, “Accent issues in large vocabulary continuous speech recognition”, International Journal of Speech Technology, Vol. 7, No. 2-3, pp. 141-153, 2004 DOI: https://doi.org/10.1023/B:IJST.0000017014.52972.1d
M. A. Anasuya, S. K. Katti, “Speech recognition by machine: A review”, International Journal of Computer Science and Information Security, Vol. 6, No. 3, pp. 181-205, 2009
P. L. Garvin, P. Ladefoged, “Speaker identification and message identification in speech recognition”, Phonetica, Vol. 9, No. 4, pp. 193-199, 1963 DOI: https://doi.org/10.1159/000258404
G. Ceidaite, L. Telksnys, “Analysis of factors influencing accuracy of speech recognition”, Elektronika ir Elektrotechnika, Vol. 105, No. 9, pp. 69-72, 2010
Z. H. Tan, B. Lindberg, “Speech recognition on mobile devices”, in: Mobile Multimedia Processing – WMMP 2008, Lecture Notes in Computer Science, Vol. 5960, Springer, 2010 DOI: https://doi.org/10.1007/978-3-642-12349-8_13
W. Li, K. Takeda, F. Itakura, “Robust in-car speech recognition based on nonlinear multiple regressions”, EURASIP Journal on Advances in Signal Processing, 2007 DOI: https://doi.org/10.1155/2007/16921
W. Ou, W. Gao, Z. Li, S. Zhang, Q. Wang, “Application of keywords speech recognition in agricultural voice system”, Second International Conference on Computational Intelligence and Natural Computing, Wuhan, China, September 13-14, 2010
L. Zhu, L. Chen, D. Zhao, J. Zhou, W. Zhang, “Emotion recognition from Chinese speech for smart affective services using a combination of SVM and DBN”, Sensors, Vol. 17, No. 7, 2017 DOI: https://doi.org/10.3390/s17071694
J. E. Noriega-Linares, J. M. Navarro Ruiz, “On the application of the raspberry Pi as an advanced acoustic sensor network for noise monitoring”, Electronics, Vol. 5, No. 4, 2016 DOI: https://doi.org/10.3390/electronics5040074
M. Al-Rousan, K. Assaleh, “A wavelet-and neural network-based voice system for a smart wheelchair control”, Journal of the Franklin Institute, Vol. 348, No. 1, pp. 90-100, 2011 DOI: https://doi.org/10.1016/j.jfranklin.2009.02.005
I. V. McLoughlin, H. R. Sharifzadeh, “Speech recognition for smart homes”, in: Speech Recognition, Technologies and Applications, Intech, 2008 DOI: https://doi.org/10.5772/6363
A. Glowacz, “Diagnostics of rotor damages of three-phase induction motors using acoustic signals and SMOFS-20-EXPANDED”, Archives of Acoustics, Vol. 41, No. 3, pp. 507-515, 2016 DOI: https://doi.org/10.1515/aoa-2016-0049
A. Glowacz, “Fault diagnosis of single-phase induction motor based on acoustic signals”, Mechanical Systems and Signal Processing, Vol. 117, pp. 65-80, 2019 DOI: https://doi.org/10.1016/j.ymssp.2018.07.044
M. Kunicki, A. Cichon, “Application of a phase resolved partial discharge pattern analysis for acoustic emission method in high voltage insulation systems diagnostics”, Archives of Acoustics, Vol. 43, No. 2, pp. 235-243, 2018
D. Mika, J. Jozwik, “Advanced time-frequency representation in voice signal analysis”, Advances in Science and Technology Research Journal, Vol. 12, No. 1, pp. 251-259, 2018 DOI: https://doi.org/10.12913/22998624/87028
L. Zou, Y. Guo, H. Liu, L. Zhang, T. Zhao, “A method of abnormal states detection based on adaptive extraction of transformer vibro-acoustic signals”, Energies, Vol. 10, No. 12, 2017 DOI: https://doi.org/10.3390/en10122076
H. Yang, G. Wen, Q. Hu, Y. Li, L. Dai, “Experimental investigation on influence factors of acoustic emission activity in coal failure process”, Energies, Vol. 11, No. 6, Article ID 1414, 2018 DOI: https://doi.org/10.3390/en11061414
L. Mokhtarpour, H. Hassanpour, “A self-tuning hybrid active noise control system”, Journal of the Franklin Institute, Vol. 349, No. 5, pp. 1904-1914, 2012 DOI: https://doi.org/10.1016/j.jfranklin.2012.02.016
S. C. Lee, J. F. Wang, M. H. Chen, “Threshold-based noise detection and reduction for automatic speech recognition system in human-robot interactions”, Sensors, Vol. 18, No. 7, Article ID 2068, 2018 DOI: https://doi.org/10.3390/s18072068
S. M. Kuo, W. M. Peng, “Principle and applications of asymmetric crosstalk-resistant adaptive noise canceler”, Journal of the Franklin Institute, Vol. 337, No. 1, pp. 57-71, 2000 DOI: https://doi.org/10.1016/S0016-0032(00)00007-7
J. W. Hung, J. S. Lin, P. J. Wu, “Employing robust principal component analysis for noise-robust speech feature extraction in automatic speech recognition with the structure of a deep neural network”, Applied System Innovation, Vol. 1, No. 3, Article ID 28, 2018 DOI: https://doi.org/10.3390/asi1030028
R. P. Lippmann, “Speech recognition by machines and humans”, Speech Communication, Vol. 22, No. 1, pp. 1-15, 1997 DOI: https://doi.org/10.1016/S0167-6393(97)00021-6
J. B. Allen, “How do humans process and recognize speech?”, IEEE Transactions on Speech and Audio Processing, Vol. 2, No. 4, pp. 567-577, 1994 DOI: https://doi.org/10.1109/89.326615
S. Haque, R. Togneri, A. Zaknich, “Perceptual features for automatic speech recognition in noisy environments”, Speech Communication, Vol. 51, No. 1, pp. 58-75, 2009 DOI: https://doi.org/10.1016/j.specom.2008.06.002
H. Hermansky, “Perceptual linear predictive (PLP) analysis of speech”, The Journal of the Acoustical Society of America, Vol. 87, No. 4, pp. 1738-1752, 1990 DOI: https://doi.org/10.1121/1.399423
M. Holmberg, D. Gelbart, W. Hemmert, “Automatic speech recognition with an adaptation model motivated by auditory processing”, IEEE Transactions on Audio, Speech, and Language Processing, Vol. 14, No. 1, pp. 43-49, 2005 DOI: https://doi.org/10.1109/TSA.2005.860349
C. Kim, R. M. Stern, “Power-normalized Cepstral Coefficients (PNCC) for robust speech recognition”, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing, Kyoto, Japan, March 25-30, 2012 DOI: https://doi.org/10.1109/ICASSP.2012.6288820
M. L. Seltzer, D. Yu, Y. Wang, “An investigation of deep neural networks for noise robust speech recognition”, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, Canada, May 26-31, 2013 DOI: https://doi.org/10.1109/ICASSP.2013.6639100
A. L. Maas, Q. V. Le, T. M. O'Neil, O. Vinyals, P. Nguyen, A. Y. Ng, “Recurrent neural networks for noise reduction in robust ASR”, 13th Annual Conference of the International Speech Communication Association, Portland, USA, September 9-13, 2012
M. Wollmer, B. Schuller, F. Eyben, G. Rigoll, “Combining long short-term memory and dynamic bayesian networks for incremental emotion-sensitive artificial listening”, IEEE Journal of Selected Topics in Signal Processing, Vol. 4, No. 5, pp. 867-881, 2010 DOI: https://doi.org/10.1109/JSTSP.2010.2057200
Z. Zhang, J. Geiger, J. Pohjalainen, A. E. D. Mousa, W. Jin, B. Schuller, “Deep learning for environmentally robust speech recognition: An overview of recent developments”, ACM Transactions on Intelligent Systems and Technology, Vol. 9, No. 5, pp. 1-28, 2018 DOI: https://doi.org/10.1145/3178115
E. Principi, S. Squartini, F. Piazza, “Power normalized cepstral coefficients based supervectors and i-vectors for small vocabulary speech recognition”, 2014 International Joint Conference on Neural Networks, Beijing, China, July 6-11, 2014 DOI: https://doi.org/10.1109/IJCNN.2014.6889552
E. Loweimi, S. M. Ahadi, “A New group delay-based feature for robust speech recognition”, 2011 IEEE International Conference on Multimedia and Expo, Barcelona, Spain, July 11-15, 2011 DOI: https://doi.org/10.1109/ICME.2011.6011884
B. Kurian, K. T. Shanavaz, N. G. Kurup, “PNCC based speech enhancement and its performance evaluation using SNR Loss”, 2017 International Conference on Networks & Advances in Computational Technologies, Thiruvanthapuram, India, July 20-22, 2017 DOI: https://doi.org/10.1109/NETACT.2017.8076815
T. Fux, D. Jouvet, “Evaluation of PNCC and extended spectral subtraction methods for robust speech recognition”, 23rd European Signal Processing Conference, Nice, France, August 31 – September 4, 2015 DOI: https://doi.org/10.1109/EUSIPCO.2015.7362617
A. Kaur, A. Singh, “Power-Normalized Cepstral Coefficients (PNCC) for Punjabi automatic speech recognition using phone based modelling in HTK”, 2nd International Conference on Applied and Theoretical Computing and Communication Technology, Bangalore, India, July 21-23, 2016 DOI: https://doi.org/10.1109/ICATCCT.2016.7912026
C. Kim, R. M. Stern, “Feature extraction for robust speec recognition based on Mmximizing the sharpness of the power distribution and on power flooring”, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, Dallas, USA, March 14-19, 2010 DOI: https://doi.org/10.1109/ICASSP.2010.5495570
D. S. Kim, S. Y. Lee, R. M. Kil, “Auditory processing of speech signals for robust speech recognition in real-world noisy environments”, IEEE Transactions on Speech and Audio Processing, Vol. 7, No. 1, pp. 55-69, 1999 DOI: https://doi.org/10.1109/89.736331
Downloads
How to Cite
License
Authors who publish with this journal agree to the following terms:
- Authors retain the copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) after its publication in ETASR with an acknowledgement of its initial publication in this journal.