Integrated smoothed location model and data reduction approaches for multi variables classification
Smoothed Location Model is a classification rule that deals with mixture of continuous variables and binary variables simultaneously. This rule discriminates groups in a parametric form using conditional distribution of the continuous variables given each pattern of the binary variables. To conduct...
Saved in:
Main Author: | |
---|---|
Format: | Thesis |
Language: | English English |
Published: |
2014
|
Subjects: | |
Online Access: | https://etd.uum.edu.my/4420/1/s92365.pdf https://etd.uum.edu.my/4420/2/s92365_abstract.pdf https://etd.uum.edu.my/4420/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
id |
my.uum.etd.4420 |
---|---|
record_format |
eprints |
spelling |
my.uum.etd.44202022-07-27T01:24:25Z https://etd.uum.edu.my/4420/ Integrated smoothed location model and data reduction approaches for multi variables classification Hashibah, Hamid QA71-90 Instruments and machines Smoothed Location Model is a classification rule that deals with mixture of continuous variables and binary variables simultaneously. This rule discriminates groups in a parametric form using conditional distribution of the continuous variables given each pattern of the binary variables. To conduct a practical classification analysis, the objects must first be sorted into the cells of a multinomial table generated from the binary variables. Then, the parameters in each cell will be estimated using the sorted objects. However, in many situations, the estimated parameters are poor if the number of binary is large relative to the size of sample. Large binary variables will create too many multinomial cells which are empty, leading to high sparsity problem and finally give exceedingly poor performance for the constructed rule. In the worst case scenario, the rule cannot be constructed. To overcome such shortcomings, this study proposes new strategies to extract adequate variables that contribute to optimum performance of the rule. Combinations of two extraction techniques are introduced, namely 2PCA and PCA+MCA with new cutpoints of eigenvalue and total variance explained, to determine adequate extracted variables which lead to minimum misclassification rate. The outcomes from these extraction techniques are used to construct the smoothed location models, which then produce two new approaches of classification called 2PCALM and 2DLM. Numerical evidence from simulation studies demonstrates that the computed misclassification rate indicates no significant difference between the extraction techniques in normal and non-normal data. Nevertheless, both proposed approaches are slightly affected for non-normal data and severely affected for highly overlapping groups. Investigations on some real data sets show that the two approaches are competitive with, and better than other existing classification methods. The overall findings reveal that both proposed approaches can be considered as improvement to the location model, and alternatives to other classification methods particularly in handling mixed variables with large binary size. 2014 Thesis NonPeerReviewed text en https://etd.uum.edu.my/4420/1/s92365.pdf text en https://etd.uum.edu.my/4420/2/s92365_abstract.pdf Hashibah, Hamid (2014) Integrated smoothed location model and data reduction approaches for multi variables classification. PhD. thesis, Universiti Utara Malaysia. |
institution |
Universiti Utara Malaysia |
building |
UUM Library |
collection |
Institutional Repository |
continent |
Asia |
country |
Malaysia |
content_provider |
Universiti Utara Malaysia |
content_source |
UUM Electronic Theses |
url_provider |
http://etd.uum.edu.my/ |
language |
English English |
topic |
QA71-90 Instruments and machines |
spellingShingle |
QA71-90 Instruments and machines Hashibah, Hamid Integrated smoothed location model and data reduction approaches for multi variables classification |
description |
Smoothed Location Model is a classification rule that deals with mixture of continuous variables and binary variables simultaneously. This rule discriminates groups in a parametric form using conditional distribution of the continuous variables given each pattern of the binary variables. To conduct a practical
classification analysis, the objects must first be sorted into the cells of a multinomial table generated from the binary variables. Then, the parameters in each cell will be estimated using the sorted objects. However, in many situations, the estimated parameters are poor if the number of binary is large relative to the size of sample. Large binary variables will create too many multinomial cells which are empty, leading to high sparsity problem and finally give exceedingly poor performance for
the constructed rule. In the worst case scenario, the rule cannot be constructed. To
overcome such shortcomings, this study proposes new strategies to extract adequate variables that contribute to optimum performance of the rule. Combinations of two extraction techniques are introduced, namely 2PCA and PCA+MCA with new cutpoints of eigenvalue and total variance explained, to determine adequate extracted
variables which lead to minimum misclassification rate. The outcomes from these
extraction techniques are used to construct the smoothed location models, which then produce two new approaches of classification called 2PCALM and 2DLM. Numerical evidence from simulation studies demonstrates that the computed misclassification rate indicates no significant difference between the extraction
techniques in normal and non-normal data. Nevertheless, both proposed approaches are slightly affected for non-normal data and severely affected for highly overlapping groups. Investigations on some real data sets show that the two approaches are competitive with, and better than other existing classification methods. The overall findings reveal that both proposed approaches can be
considered as improvement to the location model, and alternatives to other classification methods particularly in handling mixed variables with large binary size. |
format |
Thesis |
author |
Hashibah, Hamid |
author_facet |
Hashibah, Hamid |
author_sort |
Hashibah, Hamid |
title |
Integrated smoothed location model and data reduction approaches for multi variables classification |
title_short |
Integrated smoothed location model and data reduction approaches for multi variables classification |
title_full |
Integrated smoothed location model and data reduction approaches for multi variables classification |
title_fullStr |
Integrated smoothed location model and data reduction approaches for multi variables classification |
title_full_unstemmed |
Integrated smoothed location model and data reduction approaches for multi variables classification |
title_sort |
integrated smoothed location model and data reduction approaches for multi variables classification |
publishDate |
2014 |
url |
https://etd.uum.edu.my/4420/1/s92365.pdf https://etd.uum.edu.my/4420/2/s92365_abstract.pdf https://etd.uum.edu.my/4420/ |
_version_ |
1739833375210864640 |
score |
13.209306 |