Staff View: A deep action-oriented video image classification system for text detection and recognition

A deep action-oriented video image classification system for text detection and recognition

For the video images with complex actions, achieving accurate text detection and recognition results is very challenging. This paper presents a hybrid model for classification of action-oriented video images which reduces the complexity of the problem to improve text detection and recognition perfor...

Full description

Saved in:

Bibliographic Details
Main Authors:	Chaudhuri, Abhra, Shivakumara, Palaiahnakote, Chowdhury, Pinaki Nath, Pal, Umapada, Lu, Tong, Lopresti, Daniel, Kumar, Govindaraj Hemantha
Format:	Article
Published:	Springer 2021
Subjects:	QA75 Electronic computers. Computer science
Online Access:	http://eprints.um.edu.my/34804/
Tags:	Add Tag No Tags, Be the first to tag this record!

id	my.um.eprints.34804
record_format	eprints
spelling	my.um.eprints.348042022-09-07T06:32:27Z http://eprints.um.edu.my/34804/ A deep action-oriented video image classification system for text detection and recognition Chaudhuri, Abhra Shivakumara, Palaiahnakote Chowdhury, Pinaki Nath Pal, Umapada Lu, Tong Lopresti, Daniel Kumar, Govindaraj Hemantha QA75 Electronic computers. Computer science For the video images with complex actions, achieving accurate text detection and recognition results is very challenging. This paper presents a hybrid model for classification of action-oriented video images which reduces the complexity of the problem to improve text detection and recognition performance. Here, we consider the following five categories of genres, namely concert, cooking, craft, teleshopping and yoga. For classifying action-oriented video images, we explore ResNet50 for learning the general pixel-distribution level information and the VGG16 network is implemented for learning the features of Maximally Stable Extremal Regions and again another VGG16 is used for learning facial components obtained by a multitask cascaded convolutional network. The approach integrates the outputs of the three above-mentioned models using a fully connected neural network for classification of five action-oriented image classes. We demonstrated the efficacy of the proposed method by testing on our dataset and two other standard datasets, namely, Scene Text Dataset dataset which contains 10 classes of scene images with text information, and the Stanford 40 Actions dataset which contains 40 action classes without text information. Our method outperforms the related existing work and enhances the class-specific performance of text detection and recognition, significantly. Springer 2021-11 Article PeerReviewed Chaudhuri, Abhra and Shivakumara, Palaiahnakote and Chowdhury, Pinaki Nath and Pal, Umapada and Lu, Tong and Lopresti, Daniel and Kumar, Govindaraj Hemantha (2021) A deep action-oriented video image classification system for text detection and recognition. SN Applied Sciences, 3 (11). ISSN 2523-3971, DOI https://doi.org/10.1007/s42452-021-04821-z <https://doi.org/10.1007/s42452-021-04821-z>. 10.1007/s42452-021-04821-z
institution	Universiti Malaya
building	UM Library
collection	Institutional Repository
continent	Asia
country	Malaysia
content_provider	Universiti Malaya
content_source	UM Research Repository
url_provider	http://eprints.um.edu.my/
topic	QA75 Electronic computers. Computer science
spellingShingle	QA75 Electronic computers. Computer science Chaudhuri, Abhra Shivakumara, Palaiahnakote Chowdhury, Pinaki Nath Pal, Umapada Lu, Tong Lopresti, Daniel Kumar, Govindaraj Hemantha A deep action-oriented video image classification system for text detection and recognition
description	For the video images with complex actions, achieving accurate text detection and recognition results is very challenging. This paper presents a hybrid model for classification of action-oriented video images which reduces the complexity of the problem to improve text detection and recognition performance. Here, we consider the following five categories of genres, namely concert, cooking, craft, teleshopping and yoga. For classifying action-oriented video images, we explore ResNet50 for learning the general pixel-distribution level information and the VGG16 network is implemented for learning the features of Maximally Stable Extremal Regions and again another VGG16 is used for learning facial components obtained by a multitask cascaded convolutional network. The approach integrates the outputs of the three above-mentioned models using a fully connected neural network for classification of five action-oriented image classes. We demonstrated the efficacy of the proposed method by testing on our dataset and two other standard datasets, namely, Scene Text Dataset dataset which contains 10 classes of scene images with text information, and the Stanford 40 Actions dataset which contains 40 action classes without text information. Our method outperforms the related existing work and enhances the class-specific performance of text detection and recognition, significantly.
format	Article
author	Chaudhuri, Abhra Shivakumara, Palaiahnakote Chowdhury, Pinaki Nath Pal, Umapada Lu, Tong Lopresti, Daniel Kumar, Govindaraj Hemantha
author_facet	Chaudhuri, Abhra Shivakumara, Palaiahnakote Chowdhury, Pinaki Nath Pal, Umapada Lu, Tong Lopresti, Daniel Kumar, Govindaraj Hemantha
author_sort	Chaudhuri, Abhra
title	A deep action-oriented video image classification system for text detection and recognition
title_short	A deep action-oriented video image classification system for text detection and recognition
title_full	A deep action-oriented video image classification system for text detection and recognition
title_fullStr	A deep action-oriented video image classification system for text detection and recognition
title_full_unstemmed	A deep action-oriented video image classification system for text detection and recognition
title_sort	deep action-oriented video image classification system for text detection and recognition
publisher	Springer
publishDate	2021
url	http://eprints.um.edu.my/34804/
_version_	1744649197182779392
score	13.154949

A deep action-oriented video image classification system for text detection and recognition

Similar Items