Staff View: End-to-end object detection with transformers

End-to-end object detection with transformers

In the past decade, You Only Look Once (YOLO) series has become the most prevalent framework for object detection owing to its superiority in terms of accuracy and speed. However, with the advent of transformer-based architecture, there has been a paradigm shift in developing real-time detector mo...

Full description

Saved in:

Bibliographic Details
Main Author:	Lai, Eddy Thin Jun
Format:	Final Year Project / Dissertation / Thesis
Published:	2024
Subjects:	QA75 Electronic computers. Computer science T Technology (General)
Online Access:	http://eprints.utar.edu.my/6556/1/MH_1901182_Final_EDDY_LAI_THIN_JUN.pdf http://eprints.utar.edu.my/6556/
Tags:	Add Tag No Tags, Be the first to tag this record!

id	my-utar-eprints.6556
record_format	eprints
spelling	my-utar-eprints.65562024-07-09T07:35:22Z End-to-end object detection with transformers Lai, Eddy Thin Jun QA75 Electronic computers. Computer science T Technology (General) In the past decade, You Only Look Once (YOLO) series has become the most prevalent framework for object detection owing to its superiority in terms of accuracy and speed. However, with the advent of transformer-based architecture, there has been a paradigm shift in developing real-time detector models. This thesis aims to investigate the performance of YOLOv8 and Real-Time DEtection TRansformer (RT-DETR) variants in the context of urban zone aerial object detection tasks. Specifically, a total of five models namely YOLOv8n, YOLOv8s, YOLOv8m, RT-DETR-r18, and RT-DETR-r50 are trained using an expensive graphics processing unit (GPU) and subsequently executed on a central processing unit (CPU), which is more relevant for power-hungry drone applications. Experiment results reveal that RT-DETR-r50 stands out with the highest mean average precision 50-95 (mAP 50-95) of 0.598, whereas YOLOv8n achieves the fastest inference speed of 30.4 frames per second (FPS). Such benefits come at the expense of slow speed (1.7 FPS) and poor accuracy (mAP 50-95 of 0.440), respectively. In this sense, YOLOv8s emerges as the most promising model due to its ability in striving the best tradeoff between accuracy (mAP 50-95 of 0.529) and speed (11.4 FPS). 2024 Final Year Project / Dissertation / Thesis NonPeerReviewed application/pdf http://eprints.utar.edu.my/6556/1/MH_1901182_Final_EDDY_LAI_THIN_JUN.pdf Lai, Eddy Thin Jun (2024) End-to-end object detection with transformers. Final Year Project, UTAR. http://eprints.utar.edu.my/6556/
institution	Universiti Tunku Abdul Rahman
building	UTAR Library
collection	Institutional Repository
continent	Asia
country	Malaysia
content_provider	Universiti Tunku Abdul Rahman
content_source	UTAR Institutional Repository
url_provider	http://eprints.utar.edu.my
topic	QA75 Electronic computers. Computer science T Technology (General)
spellingShingle	QA75 Electronic computers. Computer science T Technology (General) Lai, Eddy Thin Jun End-to-end object detection with transformers
description	In the past decade, You Only Look Once (YOLO) series has become the most prevalent framework for object detection owing to its superiority in terms of accuracy and speed. However, with the advent of transformer-based architecture, there has been a paradigm shift in developing real-time detector models. This thesis aims to investigate the performance of YOLOv8 and Real-Time DEtection TRansformer (RT-DETR) variants in the context of urban zone aerial object detection tasks. Specifically, a total of five models namely YOLOv8n, YOLOv8s, YOLOv8m, RT-DETR-r18, and RT-DETR-r50 are trained using an expensive graphics processing unit (GPU) and subsequently executed on a central processing unit (CPU), which is more relevant for power-hungry drone applications. Experiment results reveal that RT-DETR-r50 stands out with the highest mean average precision 50-95 (mAP 50-95) of 0.598, whereas YOLOv8n achieves the fastest inference speed of 30.4 frames per second (FPS). Such benefits come at the expense of slow speed (1.7 FPS) and poor accuracy (mAP 50-95 of 0.440), respectively. In this sense, YOLOv8s emerges as the most promising model due to its ability in striving the best tradeoff between accuracy (mAP 50-95 of 0.529) and speed (11.4 FPS).
format	Final Year Project / Dissertation / Thesis
author	Lai, Eddy Thin Jun
author_facet	Lai, Eddy Thin Jun
author_sort	Lai, Eddy Thin Jun
title	End-to-end object detection with transformers
title_short	End-to-end object detection with transformers
title_full	End-to-end object detection with transformers
title_fullStr	End-to-end object detection with transformers
title_full_unstemmed	End-to-end object detection with transformers
title_sort	end-to-end object detection with transformers
publishDate	2024
url	http://eprints.utar.edu.my/6556/1/MH_1901182_Final_EDDY_LAI_THIN_JUN.pdf http://eprints.utar.edu.my/6556/
_version_	1806434810631028736
score	13.214267

End-to-end object detection with transformers

Similar Items