Tree-based contrast subspace mining method

Mining contrast subspace finds subsets of features or subspaces where a query object is most likely similar to target class against other class in a multidimensional data set of two classes. Those subspaces are termed as contrast subspaces. All existing mining contrast subspace methods (i.e. CSMiner...

Full description

Saved in:
Bibliographic Details
Main Author: Florence Sia Fui Sze
Format: Thesis
Language:English
English
Published: 2020
Subjects:
Online Access:https://eprints.ums.edu.my/id/eprint/41108/1/24%20PAGES.pdf
https://eprints.ums.edu.my/id/eprint/41108/2/FULLTEXT.pdf
https://eprints.ums.edu.my/id/eprint/41108/
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.ums.eprints.41108
record_format eprints
spelling my.ums.eprints.411082024-10-10T04:09:45Z https://eprints.ums.edu.my/id/eprint/41108/ Tree-based contrast subspace mining method Florence Sia Fui Sze TN1-997 Mining engineering. Metallurgy Mining contrast subspace finds subsets of features or subspaces where a query object is most likely similar to target class against other class in a multidimensional data set of two classes. Those subspaces are termed as contrast subspaces. All existing mining contrast subspace methods (i.e. CSMiner and CSMiner-BPR) use density-based likelihood contrast scoring function to estimate the likelihood of a query object to target class against other class in a subspace. Query object resides in the area that has high ratio of probability density of target class to probability density of other class with respect to query object in a contrast subspace. However, the probability density estimation of a class requires adjustment to the dimensionality or number of features in subspaces which may affect the performance of mining contrast subspace. Besides, the parameter setting and the subspace search strategy of all existing methods are not being optimized to mine contrast subspace. They also cannot be directly applied to mine contrast subspaces in categorical data. In this thesis, a novel tree-based contrast subspace mining method is introduced which employs tree-based likelihood contrast scoring function that is not affected by the dimensionality of subspaces. Tree-based likelihood contrast scoring function recursively partitions a subspace space in the way that query object fall in a group that has high ratio of probability of target class and probability of other class in a contrast subspace. The tree-based method begins with feature selection phase which finds relevant features and followed by contrast subspace search phase to search contrast subspaces from the relevant features, accordance to the tree-based likelihood contrast scoring function. Genetic algorithm has been widely used to find global solution to optimization and search problem. Hence, this thesis presents the optimization of parameters values for the tree-based method by genetic algorithm. This thesis also presents the optimization of contrast subspace search of the tree-based method by genetic algorithm. In addition, the tree-based method is extended to mine contrast subspaces of query object in categorical data. The research works involve first preparing the real world numerical and categorical data sets. Then, the tree-based method, the genetic algorithm based parameter values identification of tree-based method, and followed by the genetic algorithm based tree-based method, for numerical data sets are developed and evaluated. Lastly, the extended tree-based method for categorical data sets is developed and evaluated. The effectiveness of the tree-based method in mining contrast subspace is evaluated by the classification accuracy on the obtained contrast subspaces with respect to query object. The empirical results demonstrated that the tree-based method is capable to find relevant contrast subspace of the given query object while the tree-based method with the optimized parameter setting is the best for mining contrast subspace in numerical data. Furthermore, the results exhibited that the extended tree-based method is capable to find contrast subspace of query object in categorical data. 2020 Thesis NonPeerReviewed text en https://eprints.ums.edu.my/id/eprint/41108/1/24%20PAGES.pdf text en https://eprints.ums.edu.my/id/eprint/41108/2/FULLTEXT.pdf Florence Sia Fui Sze (2020) Tree-based contrast subspace mining method. Doctoral thesis, Universiti Malaysia Sabah.
institution Universiti Malaysia Sabah
building UMS Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Malaysia Sabah
content_source UMS Institutional Repository
url_provider http://eprints.ums.edu.my/
language English
English
topic TN1-997 Mining engineering. Metallurgy
spellingShingle TN1-997 Mining engineering. Metallurgy
Florence Sia Fui Sze
Tree-based contrast subspace mining method
description Mining contrast subspace finds subsets of features or subspaces where a query object is most likely similar to target class against other class in a multidimensional data set of two classes. Those subspaces are termed as contrast subspaces. All existing mining contrast subspace methods (i.e. CSMiner and CSMiner-BPR) use density-based likelihood contrast scoring function to estimate the likelihood of a query object to target class against other class in a subspace. Query object resides in the area that has high ratio of probability density of target class to probability density of other class with respect to query object in a contrast subspace. However, the probability density estimation of a class requires adjustment to the dimensionality or number of features in subspaces which may affect the performance of mining contrast subspace. Besides, the parameter setting and the subspace search strategy of all existing methods are not being optimized to mine contrast subspace. They also cannot be directly applied to mine contrast subspaces in categorical data. In this thesis, a novel tree-based contrast subspace mining method is introduced which employs tree-based likelihood contrast scoring function that is not affected by the dimensionality of subspaces. Tree-based likelihood contrast scoring function recursively partitions a subspace space in the way that query object fall in a group that has high ratio of probability of target class and probability of other class in a contrast subspace. The tree-based method begins with feature selection phase which finds relevant features and followed by contrast subspace search phase to search contrast subspaces from the relevant features, accordance to the tree-based likelihood contrast scoring function. Genetic algorithm has been widely used to find global solution to optimization and search problem. Hence, this thesis presents the optimization of parameters values for the tree-based method by genetic algorithm. This thesis also presents the optimization of contrast subspace search of the tree-based method by genetic algorithm. In addition, the tree-based method is extended to mine contrast subspaces of query object in categorical data. The research works involve first preparing the real world numerical and categorical data sets. Then, the tree-based method, the genetic algorithm based parameter values identification of tree-based method, and followed by the genetic algorithm based tree-based method, for numerical data sets are developed and evaluated. Lastly, the extended tree-based method for categorical data sets is developed and evaluated. The effectiveness of the tree-based method in mining contrast subspace is evaluated by the classification accuracy on the obtained contrast subspaces with respect to query object. The empirical results demonstrated that the tree-based method is capable to find relevant contrast subspace of the given query object while the tree-based method with the optimized parameter setting is the best for mining contrast subspace in numerical data. Furthermore, the results exhibited that the extended tree-based method is capable to find contrast subspace of query object in categorical data.
format Thesis
author Florence Sia Fui Sze
author_facet Florence Sia Fui Sze
author_sort Florence Sia Fui Sze
title Tree-based contrast subspace mining method
title_short Tree-based contrast subspace mining method
title_full Tree-based contrast subspace mining method
title_fullStr Tree-based contrast subspace mining method
title_full_unstemmed Tree-based contrast subspace mining method
title_sort tree-based contrast subspace mining method
publishDate 2020
url https://eprints.ums.edu.my/id/eprint/41108/1/24%20PAGES.pdf
https://eprints.ums.edu.my/id/eprint/41108/2/FULLTEXT.pdf
https://eprints.ums.edu.my/id/eprint/41108/
_version_ 1814049427578421248
score 13.209306