Optical Flow-Based Feature Selection with Mosaicking and FrIFrO Inception V3 Algorithm for Video Violence Detection

Elakiya Vijayakumar; Aruna Puviarasan; Puviarasan Natarajan; Suresh Kumar Ramu Ganesan

doi:10.48084/etasr.7270

Authors

Elakiya Vijayakumar Department of Computer Science and Engineering, Annamalai University, India
Aruna Puviarasan Department of Computer Science and Engineering, Annamalai University, India
Puviarasan Natarajan Department of Computer and Information Science, Annamalai University, India
Suresh Kumar Ramu Ganesan Department of Computer Science and Engineering, Rajiv Gandhi College of Engineering and Technology, India

Volume: 14 | Issue: 3 | Pages: 14475-14482 | June 2024 | https://doi.org/10.48084/etasr.7270

Received: 16 March 2024 | Revised: 9 April 2024 | Accepted: 14 April 2024 | Online: 1 June 2024

Corresponding author: Elakiya Vijayakumar

Abstract

Violence in recent years poses the biggest threat to society, which needs to be addressed by all means. Video-based Violence detection is very tough to discern when the person or things that are recipients of a violent act are in motion. Detection of violence in video content is a critical task with applications spanning security surveillance, content moderation, and public safety. Leveraging the power of deep learning, the Violence Guard Freeze-In Freeze-Out Inception V3(VGFrIFrOI3) deep learning model in conjunction with optical flow-based characteristics proposes an effective solution for automated violence detection in videos. This architecture is known for its efficiency and accuracy in image classification tasks and in extracting meaningful features from video frames. By fine-tuning Inception V3 on video datasets annotated for violent and non-violent actions, the network can be permitted to learn discriminative features that simplify the detection of any violent behavior. Furthermore, the aforementioned model incorporates temporal information by processing video frames sequentially and aggregating features across multiple frames using techniques, such as temporal convolutional networks or recurrent neural networks. To assess the performance of this approach, a performance comparison of the proposed model against already existing methods was conducted, demonstrating the model’s superior accuracy and robustness in detecting violent actions. The recommended approach not only offers a highly accurate solution for violence detection in video content but also provides insights into the potential of deep learning architectures like Inception V3 in addressing real-world challenges in video analysis and surveillance. The Mosaicking processing, additionally carried out in the pre-processing step, improves the algorithm performance by deploying space search minimization and optical flow-based feature extraction, aiming to extemporize accuracy.

Keywords:

deep learning, violence detection, optical flow, convolutional neural networks, InceptionV3, mosaicking

References

Jahandad, S. M. Sam, K. Kamardin, N. N. Amir Sjarif, and N. Mohamed, "Offline Signature Verification using Deep Learning Convolutional Neural Network (CNN) Architectures GoogLeNet Inception-v1 and Inception-v3," Procedia Computer Science, vol. 161, pp. 475–483, Jan. 2019. DOI: https://doi.org/10.1016/j.procs.2019.11.147

A. Demir, F. Yilmaz, and O. Kose, "Early detection of skin cancer using deep learning architectures: resnet-101 and inception-v3," in 2019 Medical Technologies Congress (TIPTEKNO), Izmir, Turkey, Oct. 2019, pp. 1–4. DOI: https://doi.org/10.1109/TIPTEKNO47231.2019.8972045

F. Chollet, "Xception: Deep Learning with Depthwise Separable Convolutions," presented at the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jul. 2017, pp. 1800–1807. DOI: https://doi.org/10.1109/CVPR.2017.195

A. E. Tio, "Face shape classification using Inception v3." arXiv, Nov. 14, 2019.

M. M. Rahman, A. A. Biswas, A. Rajbongshi, and A. Majumder, "Recognition of Local Birds of Bangladesh using MobileNet and Inception-v3," International Journal of Advanced Computer Science and Applications (IJACSA), vol. 11, no. 8, 31 2020. DOI: https://doi.org/10.14569/IJACSA.2020.0110840

N. Aneja and S. Aneja, "Transfer Learning using CNN for Handwritten Devanagari Character Recognition," in 2019 1st International Conference on Advances in Information Technology (ICAIT), Chikmagalur, India, Jul. 2019, pp. 293–296. DOI: https://doi.org/10.1109/ICAIT47043.2019.8987286

B. Peixoto, B. Lavi, P. Bestagini, Z. Dias, and A. Rocha, "Multimodal Violence Detection in Videos," in ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, Feb. 2020, pp. 2957–2961. DOI: https://doi.org/10.1109/ICASSP40776.2020.9054018

J. Carreira and A. Zisserman, "Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset," in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, Jul. 2017, pp. 4724–4733. DOI: https://doi.org/10.1109/CVPR.2017.502

T. Hassner, Y. Itcher, and O. Kliper-Gross, "Violent flows: Real-time detection of violent crowd behavior," in 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, Providence, RI, USA, Jun. 2012, pp. 1–6. DOI: https://doi.org/10.1109/CVPRW.2012.6239348

Ş. Aktı, G. A. Tataroğlu, and H. K. Ekenel, "Vision-based Fight Detection from Surveillance Cameras," in 2019 Ninth International Conference on Image Processing Theory, Tools and Applications (IPTA), Istanbul, Turkey, Nov. 2019, pp. 1–6. DOI: https://doi.org/10.1109/IPTA.2019.8936070

E. Ilg, N. Mayer, T. Saikia, M. Keuper, A. Dosovitskiy, and T. Brox, "FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks," in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, Jul. 2017, pp. 1647–1655. DOI: https://doi.org/10.1109/CVPR.2017.179

J. Li, X. Jiang, T. Sun, and K. Xu, "Efficient Violence Detection Using 3D Convolutional Neural Networks," in 2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), Taipei, Taiwan, Sep. 2019, pp. 1–8. DOI: https://doi.org/10.1109/AVSS.2019.8909883

E. Bermejo Nievas, O. Deniz Suarez, G. Bueno García, and R. Sukthankar, "Violence Detection in Video Using Computer Vision Techniques," in Computer Analysis of Images and Patterns, 2011, pp. 332–339. DOI: https://doi.org/10.1007/978-3-642-23678-5_39

K. Buskus, E. Vaiciukynas, S. Medelytė, and A. Šiaulys, "Exploring the necessity of mosaicking for underwater imagery semantic segmentation using deep learning," Journal of WSCG, vol. 30, no. 1–2, pp. 26–33, Jan. 2022. DOI: https://doi.org/10.24132/JWSCG.2022.4

S. Mansour, S. Ben Jabra, and E. Zagrouba, "A Robust Deep Learning-Based Video Watermarking Using Mosaic Generation:," in Proceedings of the 18th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, Lisbon, Portugal, 2023, pp. 668–675. DOI: https://doi.org/10.5220/0011691700003417

U. Diaa, "A Deep Learning Model to Inspect Image Forgery on SURF Keypoints of SLIC Segmented Regions," Engineering, Technology & Applied Science Research, vol. 14, no. 1, pp. 12549–12555, Feb. 2024. DOI: https://doi.org/10.48084/etasr.6622

T. Imran, A. S. Alghamdi, and M. S. Alkatheiri, "Enhanced Skin Cancer Classification using Deep Learning and Nature-based Feature Optimization," Engineering, Technology & Applied Science Research, vol. 14, no. 1, pp. 12702–12710, Feb. 2024. DOI: https://doi.org/10.48084/etasr.6604

V. A. Rajendran and S. Shanmugam, "Automated Skin Cancer Detection and Classification using Cat Swarm Optimization with a Deep Learning Model," Engineering, Technology & Applied Science Research, vol. 14, no. 1, pp. 12734–12739, Feb. 2024. DOI: https://doi.org/10.48084/etasr.6681

P. Wang, P. Wang, and E. Fan, "Violence detection and face recognition based on deep learning," Pattern Recognition Letters, vol. 142, pp. 20–24, Feb. 2021. DOI: https://doi.org/10.1016/j.patrec.2020.11.018

P. Zhou, Q. Ding, H. Luo, and X. Hou, "Violence detection in surveillance video using low-level features," PLOS ONE, vol. 13, no. 10, 2018, Art. no. e0203668. DOI: https://doi.org/10.1371/journal.pone.0203668