Vision-Based Hand Detection and Tracking Using Fusion of Kernelized Correlation Filter and Single-Shot Detection

Hand detection and tracking are key components in many computer vision applications, including hand pose estimation and gesture recognition for human–computer interaction systems, virtual reality, and augmented reality. Despite their importance, reliable hand detection in cluttered scenes remains a...

Full description

Saved in:
Bibliographic Details
Main Authors: Mohd, Mohd Norzali Haji, Mohd Asaari, Mohd Shahrimie, Ong Lay Ping, Ong Lay Ping, Bakhtiar Affendi Rosdi, Bakhtiar Affendi Rosdi
Format: Article
Language:English
Published: Mdpi 2023
Subjects:
Online Access:http://eprints.uthm.edu.my/9658/1/J16219_8930626b82c06d4375e69da013ec81a8.pdf
http://eprints.uthm.edu.my/9658/
https://doi.org/10.3390/app13137433
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Hand detection and tracking are key components in many computer vision applications, including hand pose estimation and gesture recognition for human–computer interaction systems, virtual reality, and augmented reality. Despite their importance, reliable hand detection in cluttered scenes remains a challenge. This study explores the use of deep learning techniques for fast and robust hand detection and tracking. A novel algorithm is proposed by combining the Kernelized Correlation Filter (KCF) tracker with the Single-Shot Detection (SSD) method. This integration enables the detection and tracking of hands in challenging environments, such as cluttered backgrounds and occlusions. The SSD algorithm helps reinitialize the KCF tracker when it fails or encounters drift issues due to sudden changes in hand gestures or fast movements. Testing in challenging scenes showed that the proposed tracker achieved a tracking rate of over 90% and a speed of 17 frames per second (FPS). Comparison with the KCF tracker on 17 video sequences revealed an average improvement of 13.31% in tracking detection rate (TRDR) and 27.04% in object detection error (OTE). Additional comparison with MediaPipe hand tracker on 10 hand gesture videos taken from the Intelligent Biometric Group Hand Tracking (IBGHT) dataset showed that the proposed method outperformed the MediaPipe hand tracker in terms of overall TRDR and tracking speed. The results demonstrate the promising potential of the proposed method for long-sequence tracking stability, reducing drift issues, and improving tracking performance during occlusions.