Staff View: A comparison between various human detectors and CNN-based feature extractors for human activity recognition via aerial captured video sequences

A comparison between various human detectors and CNN-based feature extractors for human activity recognition via aerial captured video sequences

Human detection and activity recognition (HDAR) in videos plays an important role in various real-life applications. Recently, object detection methods have been used to detect humans in videos for subsequent decision-making applications. This paper aims to address the problem of human detection in...

Full description

Saved in:

Bibliographic Details
Main Authors:	Aldahoul, Nouar, Karim, Hezerul Abdul, Md Sabri, Aznul Qalid, Tan, Myles Joshua Toledo, Momo, Mhd Adel, Fermin, Jamie Ledesma
Format:	Article
Published:	Institute of Electrical and Electronics Engineers 2022
Subjects:	QA75 Electronic computers. Computer science
Online Access:	http://eprints.um.edu.my/42086/
Tags:	Add Tag No Tags, Be the first to tag this record!

id	my.um.eprints.42086
record_format	eprints
spelling	my.um.eprints.420862023-10-17T09:25:30Z http://eprints.um.edu.my/42086/ A comparison between various human detectors and CNN-based feature extractors for human activity recognition via aerial captured video sequences Aldahoul, Nouar Karim, Hezerul Abdul Md Sabri, Aznul Qalid Tan, Myles Joshua Toledo Momo, Mhd Adel Fermin, Jamie Ledesma QA75 Electronic computers. Computer science Human detection and activity recognition (HDAR) in videos plays an important role in various real-life applications. Recently, object detection methods have been used to detect humans in videos for subsequent decision-making applications. This paper aims to address the problem of human detection in aerial captured video sequences using a moving camera attached to an aerial platform with dynamical events such as varied altitudes, illumination changes, camera jitter, and variations in viewpoints, object sizes and colors. Unlike traditional datasets that have frames captured by a static ground camera with medium or large regions of humans in these frames, the UCF-ARG aerial dataset is more challenging because it contains videos with large distances between the humans in the frames and the camera. The performance of human detection methods that have been described in the literature are often degraded when input video frames are distorted by noise, blur, illumination changes, and the like. To address these limitations, the object detection methods used in this study were trained on the COCO dataset and evaluated on the publicly available UCF-ARG dataset. The comparison between these detectors was done in terms of detection accuracy. The performance evaluation considers five human actions (digging, waving, throwing, walking, and running). Experimental results demonstrated that EfficientDetD7 was able to outperform other detectors with 92.9% average accuracy in detecting all activities and various conditions including blurring, addition of Gaussian noise, lightening, and darkening. Additionally, deep pre-trained convolutional neural networks (CNNs) such as ResNet and EfficientNet were used to extract highly informative features from the detected and cropped human patches. The extracted spatial features were utilized by Long Short-Term Memory (LSTM) to consider temporal relations between features for human activity recognition (HAR). Experimental results found that the EfficientNetB7-LSTM was able to outperform existing HAR methods in terms of average accuracy (80%), and average F1 score (80%). The outcome is a robust HAR system which combines EfficientDetD7, EfficientNetB7, and LSTM for human detection and activity classification. Institute of Electrical and Electronics Engineers 2022 Article PeerReviewed Aldahoul, Nouar and Karim, Hezerul Abdul and Md Sabri, Aznul Qalid and Tan, Myles Joshua Toledo and Momo, Mhd Adel and Fermin, Jamie Ledesma (2022) A comparison between various human detectors and CNN-based feature extractors for human activity recognition via aerial captured video sequences. IEEE Access, 10. pp. 63532-63553. ISSN 2169-3536, DOI https://doi.org/10.1109/ACCESS.2022.3182315 <https://doi.org/10.1109/ACCESS.2022.3182315>. 10.1109/ACCESS.2022.3182315
institution	Universiti Malaya
building	UM Library
collection	Institutional Repository
continent	Asia
country	Malaysia
content_provider	Universiti Malaya
content_source	UM Research Repository
url_provider	http://eprints.um.edu.my/
topic	QA75 Electronic computers. Computer science
spellingShingle	QA75 Electronic computers. Computer science Aldahoul, Nouar Karim, Hezerul Abdul Md Sabri, Aznul Qalid Tan, Myles Joshua Toledo Momo, Mhd Adel Fermin, Jamie Ledesma A comparison between various human detectors and CNN-based feature extractors for human activity recognition via aerial captured video sequences
description	Human detection and activity recognition (HDAR) in videos plays an important role in various real-life applications. Recently, object detection methods have been used to detect humans in videos for subsequent decision-making applications. This paper aims to address the problem of human detection in aerial captured video sequences using a moving camera attached to an aerial platform with dynamical events such as varied altitudes, illumination changes, camera jitter, and variations in viewpoints, object sizes and colors. Unlike traditional datasets that have frames captured by a static ground camera with medium or large regions of humans in these frames, the UCF-ARG aerial dataset is more challenging because it contains videos with large distances between the humans in the frames and the camera. The performance of human detection methods that have been described in the literature are often degraded when input video frames are distorted by noise, blur, illumination changes, and the like. To address these limitations, the object detection methods used in this study were trained on the COCO dataset and evaluated on the publicly available UCF-ARG dataset. The comparison between these detectors was done in terms of detection accuracy. The performance evaluation considers five human actions (digging, waving, throwing, walking, and running). Experimental results demonstrated that EfficientDetD7 was able to outperform other detectors with 92.9% average accuracy in detecting all activities and various conditions including blurring, addition of Gaussian noise, lightening, and darkening. Additionally, deep pre-trained convolutional neural networks (CNNs) such as ResNet and EfficientNet were used to extract highly informative features from the detected and cropped human patches. The extracted spatial features were utilized by Long Short-Term Memory (LSTM) to consider temporal relations between features for human activity recognition (HAR). Experimental results found that the EfficientNetB7-LSTM was able to outperform existing HAR methods in terms of average accuracy (80%), and average F1 score (80%). The outcome is a robust HAR system which combines EfficientDetD7, EfficientNetB7, and LSTM for human detection and activity classification.
format	Article
author	Aldahoul, Nouar Karim, Hezerul Abdul Md Sabri, Aznul Qalid Tan, Myles Joshua Toledo Momo, Mhd Adel Fermin, Jamie Ledesma
author_facet	Aldahoul, Nouar Karim, Hezerul Abdul Md Sabri, Aznul Qalid Tan, Myles Joshua Toledo Momo, Mhd Adel Fermin, Jamie Ledesma
author_sort	Aldahoul, Nouar
title	A comparison between various human detectors and CNN-based feature extractors for human activity recognition via aerial captured video sequences
title_short	A comparison between various human detectors and CNN-based feature extractors for human activity recognition via aerial captured video sequences
title_full	A comparison between various human detectors and CNN-based feature extractors for human activity recognition via aerial captured video sequences
title_fullStr	A comparison between various human detectors and CNN-based feature extractors for human activity recognition via aerial captured video sequences
title_full_unstemmed	A comparison between various human detectors and CNN-based feature extractors for human activity recognition via aerial captured video sequences
title_sort	comparison between various human detectors and cnn-based feature extractors for human activity recognition via aerial captured video sequences
publisher	Institute of Electrical and Electronics Engineers
publishDate	2022
url	http://eprints.um.edu.my/42086/
_version_	1781704593430806528
score	13.18916

A comparison between various human detectors and CNN-based feature extractors for human activity recognition via aerial captured video sequences

Similar Items