Abnormal Human Behavior Detection Improvement with an Efficient Attention Block
Received: 13 April 2025 | Revised: 30 April 2025 and 6 May 2025 | Accepted: 10 May 2025 | Online: 9 July 2025
Corresponding author: Ngoc Trung Nguyen
Abstract
Convolution Neural Networks (CNNs) have become an attractive method for the detection of anomalous behaviors. However, designing an efficient CNN model in terms of classification accuracy remains a challenging problem. Furthermore, the existing datasets for abnormal behavior detection are limited, with each focusing on a certain context. Therefore, a CNN model trained on a certain dataset will be adaptive for a particular context and not suitable for other contexts. This study proposes a CNN framework with an efficient attention mechanism to capture key information from multiple inputs, namely RGB, optical flow, and heatmap. Experiments were carried out on several benchmark datasets and a self-collected dataset, and the evaluation involved both single- and cross-dataset strategies. The results show the superior performance of the proposed frameworks compared to other SOTA methods in detection accuracy.
Keywords:
knowledge distillation, convolutional neural network, transfer learning, deep learning, student-teacher modelDownloads
References
N. C. Tay, C. Tee, T. S. Ong, and P. S. Teh, "Abnormal Behavior Recognition using CNN-LSTM with Attention Mechanism," in 2019 1st International Conference on Electrical, Control and Instrumentation Engineering (ICECIE), Kuala Lumpur, Malaysia, Nov. 2019, pp. 1–5. DOI: https://doi.org/10.1109/ICECIE47765.2019.8974824
A. Gangwar, V. González-Castro, E. Alegre, and E. Fidalgo, "AttM-CNN: Attention and metric learning based CNN for pornography, age and Child Sexual Abuse (CSA) Detection in images," Neurocomputing, vol. 445, pp. 81–104, Jul. 2021. DOI: https://doi.org/10.1016/j.neucom.2021.02.056
P. Kuppusamy and C. Harika, "Human Action Recognition using CNN and LSTM-RNN with Attention Model," International Journal of Innovative Technology and Exploring Engineering, vol. 8, no. 8, pp. 1639–1643, 2019.
W. Ullah, A. Ullah, T. Hussain, Z. A. Khan, and S. W. Baik, "An Efficient Anomaly Recognition Framework Using an Attention Residual LSTM in Surveillance Videos," Sensors, vol. 21, no. 8, Jan. 2021, Art. no. 2811. DOI: https://doi.org/10.3390/s21082811
H. Idrees, I. Saleemi, C. Seibert, and M. Shah, "Multi-source Multi-scale Counting in Extremely Dense Crowd Images," in 2013 IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA, Jun. 2013, pp. 2547–2554. DOI: https://doi.org/10.1109/CVPR.2013.329
R. Mehran, A. Oyama, and M. Shah, "Abnormal crowd behavior detection using social force model," in 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, Jun. 2009, pp. 935–942. DOI: https://doi.org/10.1109/CVPR.2009.5206641
C. Lu, J. Shi, and J. Jia, "Abnormal Event Detection at 150 FPS in MATLAB," in 2013 IEEE International Conference on Computer Vision, Sydney, Australia, Dec. 2013, pp. 2720–2727. DOI: https://doi.org/10.1109/ICCV.2013.338
X. Zheng, Y. Zhang, Y. Zheng, F. Luo, and X. Lu, "Abnormal event detection by a weakly supervised temporal attention network," CAAI Transactions on Intelligence Technology, vol. 7, no. 3, pp. 419–431, 2022. DOI: https://doi.org/10.1049/cit2.12068
S. Liu, X. Ma, H. Wu, and Y. Li, "An End to End Framework With Adaptive Spatio-Temporal Attention Module for Human Action Recognition," IEEE Access, vol. 8, pp. 47220–47231, 2020. DOI: https://doi.org/10.1109/ACCESS.2020.2979549
L. He, S. Wen, L. Wang, and F. Li, "Vehicle theft recognition from surveillance video based on spatiotemporal attention," Applied Intelligence, vol. 51, no. 4, pp. 2128–2143, Apr. 2021. DOI: https://doi.org/10.1007/s10489-020-01933-8
G. Yang et al., "STA-TSN: Spatial-Temporal Attention Temporal Segment Network for action recognition in video," PLOS ONE, vol. 17, no. 3, 2022, Art. no. e0265115. DOI: https://doi.org/10.1371/journal.pone.0265115
A. D. Ho, H. G. Doan, and T. T. T. Pham, "Multi-Modality Abnormal Crowd Detection with Self-Attention and Knowledge Distillation," Engineering, Technology & Applied Science Research, vol. 14, no. 5, pp. 16674–16679, Oct. 2024. DOI: https://doi.org/10.48084/etasr.8194
Y. Liu, J. Yan, and W. Ouyang, "Quality Aware Network for Set to Set Recognition," in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, Jul. 2017, pp. 4694–4703. DOI: https://doi.org/10.1109/CVPR.2017.499
C. Dupont, L. Tobias, and B. Luvison, "Crowd-11: A Dataset for Fine Grained Crowd Behaviour Analysis," in 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Honolulu, HI, USA, Jul. 2017, pp. 2184–2191. DOI: https://doi.org/10.1109/CVPRW.2017.271
J. Shao, K. Kang, C. C. Loy, and X. Wang, "Deeply learned attributes for crowded scene understanding," in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, Jun. 2015, pp. 4657–4666. DOI: https://doi.org/10.1109/CVPR.2015.7299097
J. Shao, C. C. Loy, and X. Wang, "Scene-Independent Group Profiling in Crowd," in 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, Jun. 2014, pp. 2227–2234. DOI: https://doi.org/10.1109/CVPR.2014.285
T. Hassner, Y. Itcher, and O. Kliper-Gross, "Violent flows: Real-time detection of violent crowd behavior," in 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, Jun. 2012, pp. 1–6. DOI: https://doi.org/10.1109/CVPRW.2012.6239348
C. Zhang, H. Li, X. Wang, and X. Yang, "Cross-scene crowd counting via deep convolutional neural networks," in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2015, pp. 833–841.
P. Allain, N. Courty, and T. Corpetti, "AGORASET: a dataset for crowd video analysis," in 1st ICPR International Workshop on Pattern Recognition and Crowd Analysis, Tsukuba, Japan, Aug. 2012.
T. Ellis, "Performance Metrics and Methods for Tracking in Surveillance," in Proceedings 3rd IEEE International Workshop on PETS, Copenhagen, Denmark, 2002.
E. Bermejo Nievas, O. Deniz Suarez, G. Bueno García, and R. Sukthankar, "Violence Detection in Video Using Computer Vision Techniques," in Computer Analysis of Images and Patterns, 2011, pp. 332–339. DOI: https://doi.org/10.1007/978-3-642-23678-5_39
B. Leibe, E. Seemann, and B. Schiele, "Pedestrian Detection in Crowded Scenes," in 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, 2005, vol. 1, pp. 878–885. DOI: https://doi.org/10.1109/CVPR.2005.272
A. Acsintoae et al., "UBnormal: New Benchmark for Supervised Open-Set Video Anomaly Detection," in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2022, pp. 20111–20121. DOI: https://doi.org/10.1109/CVPR52688.2022.01951
W. Luo, W. Liu, and S. Gao, "A Revisit of Sparse Coding Based Anomaly Detection in Stacked RNN Framework," in 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, Oct. 2017, pp. 341–349. DOI: https://doi.org/10.1109/ICCV.2017.45
Y. M. Bai, Y. Wang, and S. S. Wu, "Detection of Abnormal Human Behavior in Video Images based on a Hybrid Approach," nternational Journal of Advanced Computer Science and Applications, vol. 13, no. 11, pp. 346–356, 2022. DOI: https://doi.org/10.14569/IJACSA.2022.0131138
H. Bagherinezhad and S. Y. Soltani, "Abnormal Human Behavior Detection System in Video Surveillance Systems." Social Science Research Network, May 11, 2022. DOI: https://doi.org/10.2139/ssrn.4106323
M. I. Georgescu, R. T. Ionescu, F. S. Khan, M. Popescu, and M. Shah, "A Background-Agnostic Framework With Adversarial Training for Abnormal Event Detection in Video," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 9, pp. 4505–4523, Sep. 2022.
Downloads
How to Cite
License
Copyright (c) 2025 Anh Dung Ho, Huong Giang Doan, Ngoc Trung Nguyen

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain the copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) after its publication in ETASR with an acknowledgement of its initial publication in this journal.
