Visual codebook analysis in image understanding / Hoo Wai Lam

Since the popularity of web search engine, computer vision researchers have been actively investigating image understanding problem for the past decade, in order to have a robust content-based image retrieval algorithm to understand objects and scenes in the environment. Visual codebook, that act a...

Full description

Saved in:
Bibliographic Details
Main Author: Hoo, Wai Lam
Format: Thesis
Published: 2015
Subjects:
Online Access:http://studentsrepo.um.edu.my/5903/1/thesis.pdf
http://studentsrepo.um.edu.my/5903/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Since the popularity of web search engine, computer vision researchers have been actively investigating image understanding problem for the past decade, in order to have a robust content-based image retrieval algorithm to understand objects and scenes in the environment. Visual codebook, that act as a ‘dictionary’ for images, has been widely used in the literature. This thesis aims to investigate the limitations of current visual codebook algorithms and propose new solutions to deal with the identified problems. The first contribution of this thesis is to enhance the visual codebook with the introduction of soft class labels. Visual codebook suffers from weakly-supervised learning, because background image patches are wrongly assigned to the attached semantic image class label. As a resultant of that, visual codebook will learn wrong information, and thus affects the image classification performance. To deal with this problem, soft class labels are proposed in a way that both image level and patch level information are utilized. For each image patch, the soft class labels assign weight on each object classes. Therefore, the visual codebook will no longer affected by those wrongly labeled image patches. The second contribution of this thesis is to reduce human annotation effort in zeroshot learning algorithm, by proposing hierarchical class concept. In general, when only limited images are available, the efficiency of the visual codebook is affected. Therefore, a zero-shot learning approach is needed to classify those images that have not been seen by the classification model before. State-of-the-art approaches often used attributes as the zero-shot learning solution. However, attributes need extensive human annotation to perform. The proposed method performs zero-shot learning using a newly defined Coarse Class and Fine Class, so that both seen classes and unseen classes can be related. With this proposed approach, the extensive human annotation effort will no longer needed. The third contribution of this thesis is to reduce the biases that exist in the image dataset. In order to do so, visual codebook that consists of codewords that significantly represents object classes, namely keybook, employing the mutual information approach is built. The bias mentioned here includes capture bias, where the dataset will be mostly contain images from certain viewpoints. In addition, when building a dataset, researcher might unintentionally favour some specific environments (e.g. street scene), which will resulting in selection bias. These biases are embedded in the datasets, and lead to the generated visual codebook from one dataset not able to perform well in another dataset. To overcome this, in the proposed approach the codewords from all visual codebooks that significantly represent object classes are selected to build the keybook. With this, the dataset bias effect in the visual codebook will be reduced. In summary, these three research works aim to solve those current limitations in visual codebook, so that a better visual codebook representation can be built for image understanding task, and achieve the ultimate goal - improve image classification performance.