A partition based feature selection approach for mixed data clustering / Ashish Dutt

Presently, educational institutions compile and store huge volumes of data, such as student enrolment and attendance records, as well as their examination results. Mining such data yields stimulating information that serves its handlers well. Rapid growth in educational data points to the fact that...

Full description

Saved in:
Bibliographic Details
Main Author: Ashish , Dutt
Format: Thesis
Published: 2020
Subjects:
Online Access:http://studentsrepo.um.edu.my/14481/2/Ashish_Dutt.pdf
http://studentsrepo.um.edu.my/14481/1/Ashish_Dutt.pdf
http://studentsrepo.um.edu.my/14481/
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.um.stud.14481
record_format eprints
spelling my.um.stud.144812023-06-07T17:29:07Z A partition based feature selection approach for mixed data clustering / Ashish Dutt Ashish , Dutt QA76 Computer software Presently, educational institutions compile and store huge volumes of data, such as student enrolment and attendance records, as well as their examination results. Mining such data yields stimulating information that serves its handlers well. Rapid growth in educational data points to the fact that distilling massive amounts of data requires a more sophisticated set of algorithms. This issue led to the emergence of the field of Educational Data Mining (EDM). Traditional data mining algorithms cannot be directly applied to educational problems, as they may have a specific objective and function. This implies that a pre-processing algorithm has to be enforced first and only then some specific data mining methods can be applied to the problems. One such pre-processing algorithm in EDM is clustering. It is a widely used method in data mining to discover unique patterns in underlying data. It finds patterns by analysing the features in data. A feature contains a measured value. A value can be of an atomic type like categorical (text only) or numerical (number only). A categorical data type can be ordinal (ordered) or nominal (unordered). In either case, the feature is of univariate data type. Often in real-world environment, data consist of both categorical and numerical valued features. Such datasets are called mixed data. In literature, several clustering methods exist for analysing numerical or categorical data. There are a few clustering algorithms for handling mixed data. Clustering mixed data is dependent on the dissimilarities of its constituent features. This dependability on data types may influence a clustering solution. Assigning appropriate weights to the feature, such that it diminishes the data type influence may improve the performance of a partition clustering algorithm. In this thesis, a novel weighted feature selection approach on nominal features is proposed, for a partition. clustering algorithm that can handle mixed data. The proposed approach exploits the pre-processing nature of the partition clustering algorithm in the selection of weight assignment for nominal features. The benefits of weighting are demonstrated on both simulated and real-world mixed datasets. The experimental results yield better results for weighted nominal features in mixed data clustering. 2020-10 Thesis NonPeerReviewed application/pdf http://studentsrepo.um.edu.my/14481/2/Ashish_Dutt.pdf application/pdf http://studentsrepo.um.edu.my/14481/1/Ashish_Dutt.pdf Ashish , Dutt (2020) A partition based feature selection approach for mixed data clustering / Ashish Dutt. PhD thesis, Universiti Malaya. http://studentsrepo.um.edu.my/14481/
institution Universiti Malaya
building UM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Malaya
content_source UM Student Repository
url_provider http://studentsrepo.um.edu.my/
topic QA76 Computer software
spellingShingle QA76 Computer software
Ashish , Dutt
A partition based feature selection approach for mixed data clustering / Ashish Dutt
description Presently, educational institutions compile and store huge volumes of data, such as student enrolment and attendance records, as well as their examination results. Mining such data yields stimulating information that serves its handlers well. Rapid growth in educational data points to the fact that distilling massive amounts of data requires a more sophisticated set of algorithms. This issue led to the emergence of the field of Educational Data Mining (EDM). Traditional data mining algorithms cannot be directly applied to educational problems, as they may have a specific objective and function. This implies that a pre-processing algorithm has to be enforced first and only then some specific data mining methods can be applied to the problems. One such pre-processing algorithm in EDM is clustering. It is a widely used method in data mining to discover unique patterns in underlying data. It finds patterns by analysing the features in data. A feature contains a measured value. A value can be of an atomic type like categorical (text only) or numerical (number only). A categorical data type can be ordinal (ordered) or nominal (unordered). In either case, the feature is of univariate data type. Often in real-world environment, data consist of both categorical and numerical valued features. Such datasets are called mixed data. In literature, several clustering methods exist for analysing numerical or categorical data. There are a few clustering algorithms for handling mixed data. Clustering mixed data is dependent on the dissimilarities of its constituent features. This dependability on data types may influence a clustering solution. Assigning appropriate weights to the feature, such that it diminishes the data type influence may improve the performance of a partition clustering algorithm. In this thesis, a novel weighted feature selection approach on nominal features is proposed, for a partition. clustering algorithm that can handle mixed data. The proposed approach exploits the pre-processing nature of the partition clustering algorithm in the selection of weight assignment for nominal features. The benefits of weighting are demonstrated on both simulated and real-world mixed datasets. The experimental results yield better results for weighted nominal features in mixed data clustering.
format Thesis
author Ashish , Dutt
author_facet Ashish , Dutt
author_sort Ashish , Dutt
title A partition based feature selection approach for mixed data clustering / Ashish Dutt
title_short A partition based feature selection approach for mixed data clustering / Ashish Dutt
title_full A partition based feature selection approach for mixed data clustering / Ashish Dutt
title_fullStr A partition based feature selection approach for mixed data clustering / Ashish Dutt
title_full_unstemmed A partition based feature selection approach for mixed data clustering / Ashish Dutt
title_sort partition based feature selection approach for mixed data clustering / ashish dutt
publishDate 2020
url http://studentsrepo.um.edu.my/14481/2/Ashish_Dutt.pdf
http://studentsrepo.um.edu.my/14481/1/Ashish_Dutt.pdf
http://studentsrepo.um.edu.my/14481/
_version_ 1769842915156164608
score 13.160551