Extremal region detection and selection with fuzzy encoding for food recognition

This study proposes the improvement of feature representation by using Maximally Stable Extremal Region (MSER) detector in Bag of Features (BoF) model which incorporates an interest points detection and selection, and fuzzy encoding for food recognition. Three algorithms were used to accomplish t...

Full description

Saved in:
Bibliographic Details
Main Author: Razali @ Ghazali, Mohd Norhisham
Format: Thesis
Language:English
Published: 2019
Subjects:
Online Access:http://psasir.upm.edu.my/id/eprint/84594/1/FSKTM%202019%2048%20ir.pdf
http://psasir.upm.edu.my/id/eprint/84594/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This study proposes the improvement of feature representation by using Maximally Stable Extremal Region (MSER) detector in Bag of Features (BoF) model which incorporates an interest points detection and selection, and fuzzy encoding for food recognition. Three algorithms were used to accomplish the task of feature representation. The first algorithm locates interest points in food images using an MSER. Dense sampling and Difference of Gaussian (DoG) have been used in previous studies but were unable to detect salient interest points due to complex appearance of food images. MSER provide discriminative features via global segmentation. The arbitrary shape of regions produced by the global segmentation is suitable to detect interest points from mixed food objects which are known to be characterised by non-rigid deformations and very large variations in appearance. However, the traditional MSER detects very few interest points on texture-less food images. Thus, an Extremal Region Detection (ERD) algorithm in MSER is improved by finding optimum configuration of MSER parameters, allowing the quantity of interest points for certain food images to be increased appropriately. The second algorithm reduces the quantity of interest regions by using the Extremal Region Selection (ERS) algorithm. A high number of interest regions does not guarantee outstanding classification performance as redundant interest regions as well as interest regions from food images with complex background were detected. Consequently, computational effort should be used to execute the feature encoding process in the Bag of Features model. By decreasing the quantity of interest regions, the time efficiency of feature encoding can thus be improved without sacrificing classification accuracy. The ERS algorithm is performed using unsupervised learning to determine the spatial information of the interest regions detected, indicating whether they are from the image background, and can thus be removed as noise. In the third algorithm, a soft assignment technique using fuzzy encoding is used to transform low-level features into a higher-level feature representation. The fuzzy encoding approach adopts fuzzy set theory (FST) to minimise the uncertainty and plausibility problems in feature encoding arising from hard assignment and fisher vector approaches used in previous studies. The uncertainty and plausibility problems have led to confusion in assigning feature descriptions to visual words, and they occur due to the high intra-class variability of food appearances due to high diversity in color and texture. By adopting FST, a thorough evaluation is performed in each assignment of feature description to visual words, which is translated into a membership value that indicates the relevance of that assignment. The proposed methods have been evaluated using two image datasets: UECFOOD-100 and UNICT-FD1200. The performance of algorithms was measured based on classification accuracy, error rate, and precision and recall. The quality of the interest region detector was evaluated based on the quantity of interest regions. Classification was performed using a Support Vector Machine (SVM) with a linear kernel. The experimental results demonstrate the superior classification performance of the proposed methods over the previous methods. Specifically, the proposed method achieved 99.95% and 100.00% classification accuracy on the UECFOOD-100 and UNICT-FD1200 datasets, respectively, whereas previous methods have only been able to achieve 79.20% and 85.01% on the same datasets. Overall, the propose method generates a compact and discriminative visual dictionary for food recognition using only a single feature type, small numbers of interest regions, and low-dimensional feature vectors. Moreover, it provides a holistic feature representation able to give outstanding classification performance on foods with great variation in appearance.