Accelerating FPGA-surf feature detection module by memory access reduction

Feature detection is an important concept in the area of image processing to compute image abstractions of image information, which is used for image recognition and many other applications. One of the popular algorithm used is called the Speeded-Up Robust Features (SURF), which realized the scale s...

Full description

Saved in:
Bibliographic Details
Main Authors: Mohd. Yamani Idna, Idris, Nor Bakiah, Abd. Warif, Hamzah, Arof, Noorzaily, Mohamed Noor, Ainuddin Wahid, Abdul Wahab, Zaidi, Razak
Format: Article
Language:English
Published: University of Malaya 2019
Subjects:
Online Access:http://umpir.ump.edu.my/id/eprint/27574/1/Accelerating%20FPGA-SURF%20feature%20detection%20module%20by%20memory%20access%20reduction.pdf
http://umpir.ump.edu.my/id/eprint/27574/
https://doi.org/10.22452/mjcs.vol32no1.4
https://doi.org/10.22452/mjcs.vol32no1.4
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Feature detection is an important concept in the area of image processing to compute image abstractions of image information, which is used for image recognition and many other applications. One of the popular algorithm used is called the Speeded-Up Robust Features (SURF), which realized the scale space pyramid to detect the features. For this reason, prior researchers concentrate on applying parallelism onto the SURF multiple layers using technology such as Field Programmable Gate Array (FPGA). However, prior FPGA-SURF implementation does not emphasis on memory access limitation that can affect the overall performance of a system. This paper proposes a study on FPGA-SURF and memory access implementation in feature detection area. We conduct a profiling test and founds that the external memory access to fetch the integral image data in SURF highly affects the overall performance. We also found that the SURF algorithm memory access has redundant repeating pattern that can be reduced. Therefore, a controller design that stores repeating data (for the subsequent process) in an on-chip memory is proposed. This method reduces the external memory access and can increase the overall performance. The result shows that our proposed method improves the existing method (i.e. without the memory access reduction) by 1.23 times when the external memory latency is 20ns.