Staff View: A deep learning framework for multi-object tracking in team sports videos

A deep learning framework for multi-object tracking in team sports videos

In response to the challenges of Multi-Object Tracking (MOT) in sports scenes, such as severe occlusions, similar appearances, drastic pose changes, and complex motion patterns, a deep-learning framework CTGMOT (CNN-Transformer-GNN-based MOT) specifically for multiple athlete tracking in sports vide...

Full description

Saved in:

Bibliographic Details
Main Authors:	Cao, Wei, Wang, Xiaoyong, Liu, Xianxiang, Xu, Yishuai
Format:	Article
Published:	Institution of Engineering and Technology (IET) 2024
Subjects:	QA75 Electronic computers. Computer science
Online Access:	http://eprints.um.edu.my/47061/ https://doi.org/10.1049/cvi2.12266
Tags:	Add Tag No Tags, Be the first to tag this record!

id	my.um.eprints.47061
record_format	eprints
spelling	my.um.eprints.470612025-01-06T01:32:34Z http://eprints.um.edu.my/47061/ A deep learning framework for multi-object tracking in team sports videos Cao, Wei Wang, Xiaoyong Liu, Xianxiang Xu, Yishuai QA75 Electronic computers. Computer science In response to the challenges of Multi-Object Tracking (MOT) in sports scenes, such as severe occlusions, similar appearances, drastic pose changes, and complex motion patterns, a deep-learning framework CTGMOT (CNN-Transformer-GNN-based MOT) specifically for multiple athlete tracking in sports videos that performs joint modelling of detection, appearance and motion features is proposed. Firstly, a detection network that combines Convolutional Neural Networks (CNN) and Transformers is constructed to extract both local and global features from images. The fusion of appearance and motion features is achieved through a design of parallel dual-branch decoders. Secondly, graph models are built using Graph Neural Networks (GNN) to accurately capture the spatio-temporal correlations between object and trajectory features from inter-frame and intra-frame associations. Experimental results on the public sports tracking dataset SportsMOT show that the proposed framework outperforms other state-of-the-art methods for MOT in complex sport scenes. In addition, the proposed framework shows excellent generality on benchmark datasets MOT17 and MOT20. The authors propose a deep-learning framework, CTGMOT, for multi-object tracking (MOT) in complex team sports videos. The backbone network of the framework combines CNN and Transformers to extract local and global features, and uses parallel decoders to fuse appearance and motion features. To accurately capture spatial-temporal correlations, the framework adopts GNN and an attention mechanism to fuse the spatial tracking features of objects within frames as well as the temporal tracking features across different frames, which better distinguishes fast-moving and occluded targets and improves the performance of online MOT.image Institution of Engineering and Technology (IET) 2024-08 Article PeerReviewed Cao, Wei and Wang, Xiaoyong and Liu, Xianxiang and Xu, Yishuai (2024) A deep learning framework for multi-object tracking in team sports videos. IET Computer Vision, 18 (5). pp. 574-590. ISSN 1751-9632, DOI https://doi.org/10.1049/cvi2.12266 <https://doi.org/10.1049/cvi2.12266>. https://doi.org/10.1049/cvi2.12266 10.1049/cvi2.12266
institution	Universiti Malaya
building	UM Library
collection	Institutional Repository
continent	Asia
country	Malaysia
content_provider	Universiti Malaya
content_source	UM Research Repository
url_provider	http://eprints.um.edu.my/
topic	QA75 Electronic computers. Computer science
spellingShingle	QA75 Electronic computers. Computer science Cao, Wei Wang, Xiaoyong Liu, Xianxiang Xu, Yishuai A deep learning framework for multi-object tracking in team sports videos
description	In response to the challenges of Multi-Object Tracking (MOT) in sports scenes, such as severe occlusions, similar appearances, drastic pose changes, and complex motion patterns, a deep-learning framework CTGMOT (CNN-Transformer-GNN-based MOT) specifically for multiple athlete tracking in sports videos that performs joint modelling of detection, appearance and motion features is proposed. Firstly, a detection network that combines Convolutional Neural Networks (CNN) and Transformers is constructed to extract both local and global features from images. The fusion of appearance and motion features is achieved through a design of parallel dual-branch decoders. Secondly, graph models are built using Graph Neural Networks (GNN) to accurately capture the spatio-temporal correlations between object and trajectory features from inter-frame and intra-frame associations. Experimental results on the public sports tracking dataset SportsMOT show that the proposed framework outperforms other state-of-the-art methods for MOT in complex sport scenes. In addition, the proposed framework shows excellent generality on benchmark datasets MOT17 and MOT20. The authors propose a deep-learning framework, CTGMOT, for multi-object tracking (MOT) in complex team sports videos. The backbone network of the framework combines CNN and Transformers to extract local and global features, and uses parallel decoders to fuse appearance and motion features. To accurately capture spatial-temporal correlations, the framework adopts GNN and an attention mechanism to fuse the spatial tracking features of objects within frames as well as the temporal tracking features across different frames, which better distinguishes fast-moving and occluded targets and improves the performance of online MOT.image
format	Article
author	Cao, Wei Wang, Xiaoyong Liu, Xianxiang Xu, Yishuai
author_facet	Cao, Wei Wang, Xiaoyong Liu, Xianxiang Xu, Yishuai
author_sort	Cao, Wei
title	A deep learning framework for multi-object tracking in team sports videos
title_short	A deep learning framework for multi-object tracking in team sports videos
title_full	A deep learning framework for multi-object tracking in team sports videos
title_fullStr	A deep learning framework for multi-object tracking in team sports videos
title_full_unstemmed	A deep learning framework for multi-object tracking in team sports videos
title_sort	deep learning framework for multi-object tracking in team sports videos
publisher	Institution of Engineering and Technology (IET)
publishDate	2024
url	http://eprints.um.edu.my/47061/ https://doi.org/10.1049/cvi2.12266
_version_	1821001882610434048
score	13.23648

A deep learning framework for multi-object tracking in team sports videos

Similar Items