Video anomaly detection with U-Net temporal modelling and contrastive regularization

Video anomaly detection (VAD) which is able to automatically identify the location of the anomaly event that happened in the video is one of the current hot study areas in deep learning. Due to expensive frame-level annotation in video samples, most of the VAD are trained with the weakly-supervised...

Full description

Saved in:
Bibliographic Details
Main Author: Gan, Kian Yu
Format: Final Year Project / Dissertation / Thesis
Published: 2023
Subjects:
Online Access:http://eprints.utar.edu.my/5786/1/fyp_CS_2023_GKY.pdf
http://eprints.utar.edu.my/5786/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Video anomaly detection (VAD) which is able to automatically identify the location of the anomaly event that happened in the video is one of the current hot study areas in deep learning. Due to expensive frame-level annotation in video samples, most of the VAD are trained with the weakly-supervised method. In a weakly-supervised manner, the labels are at video level. VAD is still an open question and challenging task because the model is trained with a limited sample in weakly supervised video-level labels. In this project, we aim to improve the VAD network with 2 different aspects. Firstly, we explore a technique to model the local and global temporal dependencies. Temporal dependencies are critical to detect anomaly events. Previous methods such as stacked RNN, temporal consistency and ConvLSTM can only capture short-range dependencies. GCN-based methods can model long-range dependencies, but they are slower and more difficult to train. RTFM captures both the short and long-temporal dependencies using two parallel structures, one for each type. However, the two dependencies are considered separately, neglecting the close relationship between them. In this aspect, we propose to use U-Net like structure to model both local and global dependencies for specialized features generation. Second, we explore a new regularization technique in a weakly-supervised manner to reduce overfitting. Insufficient training samples will lead to overfitting easily. Generally, the overfitting issue can be improved by reducing the complexity of the network, data augmentation, injecting noise into the network or applying dropout regularization. For VAD, previous works have applied special heuristics such as sparsity constraint and temporal smoothness to regulate the output of the model. However, none of the existing work has extended a feature-based approach to regularization where the strategy is to learn more discriminative features. In this project, we extend contrastive regularization in a weakly-supervised manner as a new regularization technique to reduce overfitting by learning more discriminative features and enhancing the separability of the features from different classes. We evaluated our model’s performance and compared the AUC performance with other state-of-the-art methods. Experimental results show that our model achieves the second-highest AUC performance compared to all published work on a benchmark dataset, namely UCF-Crime using the same pre-trained features.