Tree-based contrast subspace mining method
Mining contrast subspace finds subsets of features or subspaces where a query object is most likely similar to target class against other class in a multidimensional data set of two classes. Those subspaces are termed as contrast subspaces. All existing mining contrast subspace methods (i.e. CSMiner...
Saved in:
Main Author: | |
---|---|
Format: | Thesis |
Language: | English English |
Published: |
2020
|
Subjects: | |
Online Access: | https://eprints.ums.edu.my/id/eprint/41108/1/24%20PAGES.pdf https://eprints.ums.edu.my/id/eprint/41108/2/FULLTEXT.pdf https://eprints.ums.edu.my/id/eprint/41108/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
id |
my.ums.eprints.41108 |
---|---|
record_format |
eprints |
spelling |
my.ums.eprints.411082024-10-10T04:09:45Z https://eprints.ums.edu.my/id/eprint/41108/ Tree-based contrast subspace mining method Florence Sia Fui Sze TN1-997 Mining engineering. Metallurgy Mining contrast subspace finds subsets of features or subspaces where a query object is most likely similar to target class against other class in a multidimensional data set of two classes. Those subspaces are termed as contrast subspaces. All existing mining contrast subspace methods (i.e. CSMiner and CSMiner-BPR) use density-based likelihood contrast scoring function to estimate the likelihood of a query object to target class against other class in a subspace. Query object resides in the area that has high ratio of probability density of target class to probability density of other class with respect to query object in a contrast subspace. However, the probability density estimation of a class requires adjustment to the dimensionality or number of features in subspaces which may affect the performance of mining contrast subspace. Besides, the parameter setting and the subspace search strategy of all existing methods are not being optimized to mine contrast subspace. They also cannot be directly applied to mine contrast subspaces in categorical data. In this thesis, a novel tree-based contrast subspace mining method is introduced which employs tree-based likelihood contrast scoring function that is not affected by the dimensionality of subspaces. Tree-based likelihood contrast scoring function recursively partitions a subspace space in the way that query object fall in a group that has high ratio of probability of target class and probability of other class in a contrast subspace. The tree-based method begins with feature selection phase which finds relevant features and followed by contrast subspace search phase to search contrast subspaces from the relevant features, accordance to the tree-based likelihood contrast scoring function. Genetic algorithm has been widely used to find global solution to optimization and search problem. Hence, this thesis presents the optimization of parameters values for the tree-based method by genetic algorithm. This thesis also presents the optimization of contrast subspace search of the tree-based method by genetic algorithm. In addition, the tree-based method is extended to mine contrast subspaces of query object in categorical data. The research works involve first preparing the real world numerical and categorical data sets. Then, the tree-based method, the genetic algorithm based parameter values identification of tree-based method, and followed by the genetic algorithm based tree-based method, for numerical data sets are developed and evaluated. Lastly, the extended tree-based method for categorical data sets is developed and evaluated. The effectiveness of the tree-based method in mining contrast subspace is evaluated by the classification accuracy on the obtained contrast subspaces with respect to query object. The empirical results demonstrated that the tree-based method is capable to find relevant contrast subspace of the given query object while the tree-based method with the optimized parameter setting is the best for mining contrast subspace in numerical data. Furthermore, the results exhibited that the extended tree-based method is capable to find contrast subspace of query object in categorical data. 2020 Thesis NonPeerReviewed text en https://eprints.ums.edu.my/id/eprint/41108/1/24%20PAGES.pdf text en https://eprints.ums.edu.my/id/eprint/41108/2/FULLTEXT.pdf Florence Sia Fui Sze (2020) Tree-based contrast subspace mining method. Doctoral thesis, Universiti Malaysia Sabah. |
institution |
Universiti Malaysia Sabah |
building |
UMS Library |
collection |
Institutional Repository |
continent |
Asia |
country |
Malaysia |
content_provider |
Universiti Malaysia Sabah |
content_source |
UMS Institutional Repository |
url_provider |
http://eprints.ums.edu.my/ |
language |
English English |
topic |
TN1-997 Mining engineering. Metallurgy |
spellingShingle |
TN1-997 Mining engineering. Metallurgy Florence Sia Fui Sze Tree-based contrast subspace mining method |
description |
Mining contrast subspace finds subsets of features or subspaces where a query object is most likely similar to target class against other class in a multidimensional data set of two classes. Those subspaces are termed as contrast subspaces. All existing mining contrast subspace methods (i.e. CSMiner and CSMiner-BPR) use density-based likelihood contrast scoring function to estimate the likelihood of a query object to target class against other class in a subspace. Query object resides in the area that has high ratio of probability density of target class to probability density of other class with respect to query object in a contrast subspace. However, the probability density estimation of a class requires adjustment to the dimensionality or number of features in subspaces which may affect the performance of mining contrast subspace. Besides, the parameter setting and the subspace search strategy of all existing methods are not being optimized to mine contrast subspace. They also cannot be directly applied to mine contrast subspaces in categorical data. In this thesis, a novel tree-based contrast subspace mining method is introduced which employs tree-based likelihood contrast scoring function that is not affected by the dimensionality of subspaces. Tree-based likelihood contrast scoring function recursively partitions a subspace space in the way that query object fall in a group that has high ratio of probability of target class and probability of other class in a contrast subspace. The tree-based method begins with feature selection phase which finds relevant features and followed by contrast subspace search phase to search contrast subspaces from the relevant features, accordance to the tree-based likelihood contrast scoring function. Genetic algorithm has been widely used to find global solution to optimization and search problem. Hence, this thesis presents the optimization of parameters values for the tree-based method by genetic algorithm. This thesis also presents the optimization of contrast subspace search of the tree-based method by genetic algorithm. In addition, the tree-based method is extended to mine contrast subspaces of query object in categorical data. The research works involve first preparing the real world numerical and categorical data sets. Then, the tree-based method, the genetic algorithm based parameter values identification of tree-based method, and followed by the genetic algorithm based tree-based method, for numerical data sets are developed and evaluated. Lastly, the extended tree-based method for categorical data sets is developed and evaluated. The effectiveness of the tree-based method in mining contrast subspace is evaluated by the classification accuracy on the obtained contrast subspaces with respect to query object. The empirical results demonstrated that the tree-based method is capable to find relevant contrast subspace of the given query object while the tree-based method with the optimized parameter setting is the best for mining contrast subspace in numerical data. Furthermore, the results exhibited that the extended tree-based method is capable to find contrast subspace of query object in categorical data. |
format |
Thesis |
author |
Florence Sia Fui Sze |
author_facet |
Florence Sia Fui Sze |
author_sort |
Florence Sia Fui Sze |
title |
Tree-based contrast subspace mining method |
title_short |
Tree-based contrast subspace mining method |
title_full |
Tree-based contrast subspace mining method |
title_fullStr |
Tree-based contrast subspace mining method |
title_full_unstemmed |
Tree-based contrast subspace mining method |
title_sort |
tree-based contrast subspace mining method |
publishDate |
2020 |
url |
https://eprints.ums.edu.my/id/eprint/41108/1/24%20PAGES.pdf https://eprints.ums.edu.my/id/eprint/41108/2/FULLTEXT.pdf https://eprints.ums.edu.my/id/eprint/41108/ |
_version_ |
1814049427578421248 |
score |
13.209306 |