Human action interpretation using convolutional neural network: a survey

Human action interpretation (HAI) is one of the trending domains in the era of computer vision. It can further be divided into human action recognition (HAR) and human action detection (HAD). The HAR analyzes frames and provides label(s) to overall video, whereas the HAD localizes actor first, in ea...

Full description

Saved in:
Bibliographic Details
Main Authors: Malik, Zainab, Shapiai, Mohd. Ibrahim
Format: Article
Published: Springer Science and Business Media Deutschland GmbH 2022
Subjects:
Online Access:http://eprints.utm.my/id/eprint/102763/
http://dx.doi.org/10.1007/s00138-022-01291-0
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.utm.102763
record_format eprints
spelling my.utm.1027632023-09-24T03:06:39Z http://eprints.utm.my/id/eprint/102763/ Human action interpretation using convolutional neural network: a survey Malik, Zainab Shapiai, Mohd. Ibrahim QA75 Electronic computers. Computer science TK Electrical engineering. Electronics Nuclear engineering Human action interpretation (HAI) is one of the trending domains in the era of computer vision. It can further be divided into human action recognition (HAR) and human action detection (HAD). The HAR analyzes frames and provides label(s) to overall video, whereas the HAD localizes actor first, in each frame, and then estimates the action score for the detected region. The effectiveness of a HAI model is highly dependent on the representation of spatiotemporal features and the model’s architectural design. For the effective representation of these features, various studies have been carried out. Moreover, to better learn these features and to get the action score on the basis of these features, different designs of deep architectures have also been proposed. Among various deep architectures, convolutional neural network (CNN) is relatively more explored for HAI due to its lesser computational cost. To provide overview of these efforts, various surveys have been published to date; however, none of these surveys is focusing the features’ representation and design of proposed architectures in detail. Secondly, none of these studies is focusing the pose assisted HAI techniques. This study provides a more detailed survey on existing CNN-based HAI techniques by incorporating the frame level as well as pose level spatiotemporal features-based techniques. Besides these, it offers comparative study on different publicly available datasets used to evaluate HAI models based on various spatiotemporal features’ representations. Furthermore, it also discusses the limitations and challenges of the HAI and concludes that human action interpretation from visual data is still very far from the actual interpretation of human action in realistic videos which are continuous in nature and may contain multiple human beings performing multiple actions sequentially or in parallel. Springer Science and Business Media Deutschland GmbH 2022-05 Article PeerReviewed Malik, Zainab and Shapiai, Mohd. Ibrahim (2022) Human action interpretation using convolutional neural network: a survey. Machine Vision and Applications, 33 (3). pp. 1-23. ISSN 0932-8092 http://dx.doi.org/10.1007/s00138-022-01291-0 DOI:10.1007/s00138-022-01291-0
institution Universiti Teknologi Malaysia
building UTM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Teknologi Malaysia
content_source UTM Institutional Repository
url_provider http://eprints.utm.my/
topic QA75 Electronic computers. Computer science
TK Electrical engineering. Electronics Nuclear engineering
spellingShingle QA75 Electronic computers. Computer science
TK Electrical engineering. Electronics Nuclear engineering
Malik, Zainab
Shapiai, Mohd. Ibrahim
Human action interpretation using convolutional neural network: a survey
description Human action interpretation (HAI) is one of the trending domains in the era of computer vision. It can further be divided into human action recognition (HAR) and human action detection (HAD). The HAR analyzes frames and provides label(s) to overall video, whereas the HAD localizes actor first, in each frame, and then estimates the action score for the detected region. The effectiveness of a HAI model is highly dependent on the representation of spatiotemporal features and the model’s architectural design. For the effective representation of these features, various studies have been carried out. Moreover, to better learn these features and to get the action score on the basis of these features, different designs of deep architectures have also been proposed. Among various deep architectures, convolutional neural network (CNN) is relatively more explored for HAI due to its lesser computational cost. To provide overview of these efforts, various surveys have been published to date; however, none of these surveys is focusing the features’ representation and design of proposed architectures in detail. Secondly, none of these studies is focusing the pose assisted HAI techniques. This study provides a more detailed survey on existing CNN-based HAI techniques by incorporating the frame level as well as pose level spatiotemporal features-based techniques. Besides these, it offers comparative study on different publicly available datasets used to evaluate HAI models based on various spatiotemporal features’ representations. Furthermore, it also discusses the limitations and challenges of the HAI and concludes that human action interpretation from visual data is still very far from the actual interpretation of human action in realistic videos which are continuous in nature and may contain multiple human beings performing multiple actions sequentially or in parallel.
format Article
author Malik, Zainab
Shapiai, Mohd. Ibrahim
author_facet Malik, Zainab
Shapiai, Mohd. Ibrahim
author_sort Malik, Zainab
title Human action interpretation using convolutional neural network: a survey
title_short Human action interpretation using convolutional neural network: a survey
title_full Human action interpretation using convolutional neural network: a survey
title_fullStr Human action interpretation using convolutional neural network: a survey
title_full_unstemmed Human action interpretation using convolutional neural network: a survey
title_sort human action interpretation using convolutional neural network: a survey
publisher Springer Science and Business Media Deutschland GmbH
publishDate 2022
url http://eprints.utm.my/id/eprint/102763/
http://dx.doi.org/10.1007/s00138-022-01291-0
_version_ 1778160777920774144
score 13.18916