Extremal region detection and selection with fuzzy encoding for food recognition
This study proposes the improvement of feature representation by using Maximally Stable Extremal Region (MSER) detector in Bag of Features (BoF) model which incorporates an interest points detection and selection, and fuzzy encoding for food recognition. Three algorithms were used to accomplish t...
Saved in:
Main Author: | |
---|---|
Format: | Thesis |
Language: | English |
Published: |
2019
|
Subjects: | |
Online Access: | http://psasir.upm.edu.my/id/eprint/84594/1/FSKTM%202019%2048%20ir.pdf http://psasir.upm.edu.my/id/eprint/84594/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | This study proposes the improvement of feature representation by using
Maximally Stable Extremal Region (MSER) detector in Bag of Features (BoF)
model which incorporates an interest points detection and selection, and fuzzy
encoding for food recognition. Three algorithms were used to accomplish the
task of feature representation. The first algorithm locates interest points in food
images using an MSER. Dense sampling and Difference of Gaussian (DoG)
have been used in previous studies but were unable to detect salient interest
points due to complex appearance of food images. MSER provide discriminative
features via global segmentation. The arbitrary shape of regions produced by
the global segmentation is suitable to detect interest points from mixed food
objects which are known to be characterised by non-rigid deformations and very
large variations in appearance. However, the traditional MSER detects very few
interest points on texture-less food images. Thus, an Extremal Region Detection
(ERD) algorithm in MSER is improved by finding optimum configuration of MSER
parameters, allowing the quantity of interest points for certain food images to be
increased appropriately.
The second algorithm reduces the quantity of interest regions by using the
Extremal Region Selection (ERS) algorithm. A high number of interest regions
does not guarantee outstanding classification performance as redundant interest
regions as well as interest regions from food images with complex background
were detected. Consequently, computational effort should be used to execute
the feature encoding process in the Bag of Features model. By decreasing the
quantity of interest regions, the time efficiency of feature encoding can thus be
improved without sacrificing classification accuracy. The ERS algorithm is
performed using unsupervised learning to determine the spatial information of
the interest regions detected, indicating whether they are from the image
background, and can thus be removed as noise. In the third algorithm, a soft assignment technique using fuzzy encoding is used
to transform low-level features into a higher-level feature representation. The
fuzzy encoding approach adopts fuzzy set theory (FST) to minimise the
uncertainty and plausibility problems in feature encoding arising from hard
assignment and fisher vector approaches used in previous studies. The
uncertainty and plausibility problems have led to confusion in assigning feature
descriptions to visual words, and they occur due to the high intra-class variability
of food appearances due to high diversity in color and texture. By adopting FST,
a thorough evaluation is performed in each assignment of feature description to
visual words, which is translated into a membership value that indicates the
relevance of that assignment.
The proposed methods have been evaluated using two image datasets:
UECFOOD-100 and UNICT-FD1200. The performance of algorithms was
measured based on classification accuracy, error rate, and precision and recall.
The quality of the interest region detector was evaluated based on the quantity
of interest regions. Classification was performed using a Support Vector Machine
(SVM) with a linear kernel. The experimental results demonstrate the superior
classification performance of the proposed methods over the previous methods.
Specifically, the proposed method achieved 99.95% and 100.00% classification
accuracy on the UECFOOD-100 and UNICT-FD1200 datasets, respectively,
whereas previous methods have only been able to achieve 79.20% and 85.01%
on the same datasets.
Overall, the propose method generates a compact and discriminative visual
dictionary for food recognition using only a single feature type, small numbers of
interest regions, and low-dimensional feature vectors. Moreover, it provides a
holistic feature representation able to give outstanding classification
performance on foods with great variation in appearance. |
---|