Support vector machine for solving imbalanced dataset problem

Most of machine learning classifiers such as Neural Network (NN), Naïve Bayes, and Decision Tree Method C4.5 are failed to classify the data when it deals with imbalanced data set. This is because; most of classifiers are biased to the majority class, tend to ignore minority class and treat the mino...

Full description

Saved in:
Bibliographic Details
Main Author: Mohd. Khairuddin, Ismail
Format: Thesis
Language:English
Published: 2012
Subjects:
Online Access:http://eprints.utm.my/id/eprint/32546/1/IsmailMohdKhairuddinMFKE2012.pdf
http://eprints.utm.my/id/eprint/32546/
http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:69024?site_name=Restricted Repository
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.utm.32546
record_format eprints
spelling my.utm.325462017-08-21T07:29:24Z http://eprints.utm.my/id/eprint/32546/ Support vector machine for solving imbalanced dataset problem Mohd. Khairuddin, Ismail Q Science (General) Most of machine learning classifiers such as Neural Network (NN), Naïve Bayes, and Decision Tree Method C4.5 are failed to classify the data when it deals with imbalanced data set. This is because; most of classifiers are biased to the majority class, tend to ignore minority class and treat the minority class as a noise/disturbance/variance. Generally, to tackle the imbalanced data set problem it consists of two strategies which are data level and algorithm level. The data level method consist of features selection and re-sampling the data such as undersampling, oversampling and combination of both undersampling and oversampling, while for algorithm level it consist internal modification of learning program. In this project, the Support Vector Machine (SVM) classifier is proposed in order to investigate the imbalanced data set problem. The investigation is obtained by measured the performance based on SVM classifier. This investigation will cover and measure the performance SVM classifier by measuring the g-mean value. The performance of SVM classifier is measured by measuring the g-mean value .Therefore, in order to increase the performance of SVM classifier oversampling methods called SMOTE is introduced and combine with it and the g-mean value is calculated. Experimental validation on the proposed algorithm is performed and demonstrated on various set of imbalanced data sets. Some experiment have been design to validate the proposed algorithm and performed it with various set of imbalanced data sets. Finally, the result is for each proposed algorithm is being compared and analyze. 2012 Thesis NonPeerReviewed application/pdf en http://eprints.utm.my/id/eprint/32546/1/IsmailMohdKhairuddinMFKE2012.pdf Mohd. Khairuddin, Ismail (2012) Support vector machine for solving imbalanced dataset problem. Masters thesis, Universiti Teknologi Malaysia, Faculty of Electrical Engineering. http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:69024?site_name=Restricted Repository
institution Universiti Teknologi Malaysia
building UTM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Teknologi Malaysia
content_source UTM Institutional Repository
url_provider http://eprints.utm.my/
language English
topic Q Science (General)
spellingShingle Q Science (General)
Mohd. Khairuddin, Ismail
Support vector machine for solving imbalanced dataset problem
description Most of machine learning classifiers such as Neural Network (NN), Naïve Bayes, and Decision Tree Method C4.5 are failed to classify the data when it deals with imbalanced data set. This is because; most of classifiers are biased to the majority class, tend to ignore minority class and treat the minority class as a noise/disturbance/variance. Generally, to tackle the imbalanced data set problem it consists of two strategies which are data level and algorithm level. The data level method consist of features selection and re-sampling the data such as undersampling, oversampling and combination of both undersampling and oversampling, while for algorithm level it consist internal modification of learning program. In this project, the Support Vector Machine (SVM) classifier is proposed in order to investigate the imbalanced data set problem. The investigation is obtained by measured the performance based on SVM classifier. This investigation will cover and measure the performance SVM classifier by measuring the g-mean value. The performance of SVM classifier is measured by measuring the g-mean value .Therefore, in order to increase the performance of SVM classifier oversampling methods called SMOTE is introduced and combine with it and the g-mean value is calculated. Experimental validation on the proposed algorithm is performed and demonstrated on various set of imbalanced data sets. Some experiment have been design to validate the proposed algorithm and performed it with various set of imbalanced data sets. Finally, the result is for each proposed algorithm is being compared and analyze.
format Thesis
author Mohd. Khairuddin, Ismail
author_facet Mohd. Khairuddin, Ismail
author_sort Mohd. Khairuddin, Ismail
title Support vector machine for solving imbalanced dataset problem
title_short Support vector machine for solving imbalanced dataset problem
title_full Support vector machine for solving imbalanced dataset problem
title_fullStr Support vector machine for solving imbalanced dataset problem
title_full_unstemmed Support vector machine for solving imbalanced dataset problem
title_sort support vector machine for solving imbalanced dataset problem
publishDate 2012
url http://eprints.utm.my/id/eprint/32546/1/IsmailMohdKhairuddinMFKE2012.pdf
http://eprints.utm.my/id/eprint/32546/
http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:69024?site_name=Restricted Repository
_version_ 1643649072330964992
score 13.160551