End-to-end object detection with transformers
In the past decade, You Only Look Once (YOLO) series has become the most prevalent framework for object detection owing to its superiority in terms of accuracy and speed. However, with the advent of transformer-based architecture, there has been a paradigm shift in developing real-time detector mo...
Saved in:
Main Author: | |
---|---|
Format: | Final Year Project / Dissertation / Thesis |
Published: |
2024
|
Subjects: | |
Online Access: | http://eprints.utar.edu.my/6556/1/MH_1901182_Final_EDDY_LAI_THIN_JUN.pdf http://eprints.utar.edu.my/6556/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | In the past decade, You Only Look Once (YOLO) series has become the most prevalent framework for object detection owing to its superiority in terms of accuracy and speed. However, with the advent of transformer-based architecture,
there has been a paradigm shift in developing real-time detector models. This thesis aims to investigate the performance of YOLOv8 and Real-Time DEtection TRansformer (RT-DETR) variants in the context of urban zone aerial
object detection tasks. Specifically, a total of five models namely YOLOv8n, YOLOv8s, YOLOv8m, RT-DETR-r18, and RT-DETR-r50 are trained using an expensive graphics processing unit (GPU) and subsequently executed on a central processing unit (CPU), which is more relevant for power-hungry drone
applications. Experiment results reveal that RT-DETR-r50 stands out with the highest mean average precision 50-95 (mAP 50-95) of 0.598, whereas YOLOv8n achieves the fastest inference speed of 30.4 frames per second (FPS). Such benefits come at the expense of slow speed (1.7 FPS) and poor accuracy (mAP 50-95 of 0.440), respectively. In this sense, YOLOv8s emerges as the most promising model due to its ability in striving the best tradeoff between
accuracy (mAP 50-95 of 0.529) and speed (11.4 FPS).
|
---|