Modeling students' background and academic performance with missing values using classification tree
Student's academic performance is a prime concern to high level educational institution since it will react the performance of the institution. The difierences in academic performance among students are topics that has drawn interest of many academic researchers and our society. One of the bigg...
Saved in:
Main Author: | |
---|---|
Format: | Thesis |
Language: | English |
Published: |
2014
|
Online Access: | http://psasir.upm.edu.my/id/eprint/52116/1/IPM%202014%208RRR.pdf http://psasir.upm.edu.my/id/eprint/52116/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
id |
my.upm.eprints.52116 |
---|---|
record_format |
eprints |
spelling |
my.upm.eprints.521162017-11-09T03:11:05Z http://psasir.upm.edu.my/id/eprint/52116/ Modeling students' background and academic performance with missing values using classification tree Hasan, Norsida Student's academic performance is a prime concern to high level educational institution since it will react the performance of the institution. The difierences in academic performance among students are topics that has drawn interest of many academic researchers and our society. One of the biggest challenges in universities decision making and planning today is to predict the performance of their students at the early stage prior to their admission. We address the application of inferring the degree classification of students using their background data in the dataset obtained from one of the high level educational institutions in Malaysia. We present the results of a detailed statistical analysis relating to the final degree classification obtained at the end of their studies and their backgrounds. Classification tree model produce the highest accuracy in predicting student's degree classification using their background data as compared to Bayesion network and naive Bayes. The significance of the prediction depends closely on the quality of the database and on the chosen sample dataset to be used for model training and testing. Missing values either in predictor or in response variables are a very common problem in statistics and data mining. Cases with missing values are often ignored which results in loss of information and possible bias. Surrogate split in standard classification tree is a possible choice in handling missing values for large dataset contains at most ten percent missing values. However, for dataset contains more than 10 percent missing values, there is an adverse impact on the structure of classification tree and also the accuracy. In this thesis, we propose classification tree with imputation model to handle missing values in dataset. We investigate the application of classification tree, Bayesian network and naive Bayes as the imputation techniques to handle missing values in classification tree model. The investigation includes all three types of missing values machanism; missing completely at random (MCAR), missing at random (MAR) and missing not at random (MNAR). Imputation using classification tree outperform the imputatation using Bayesian network and naive Bayes for all MCAR, MAR and MNAR. We also compare the performance of classification tree with imputation with surrogate splits in classification and regression tree (CART). Fifteen percent of student's background data are eliminated and classification tree with imputation is used to predict student's degree classification. Classification tree with imputation model produces more accurate model as compared to surrogate splits. 2014-12 Thesis NonPeerReviewed application/pdf en http://psasir.upm.edu.my/id/eprint/52116/1/IPM%202014%208RRR.pdf Hasan, Norsida (2014) Modeling students' background and academic performance with missing values using classification tree. PhD thesis, Universiti Putra Malaysia. |
institution |
Universiti Putra Malaysia |
building |
UPM Library |
collection |
Institutional Repository |
continent |
Asia |
country |
Malaysia |
content_provider |
Universiti Putra Malaysia |
content_source |
UPM Institutional Repository |
url_provider |
http://psasir.upm.edu.my/ |
language |
English |
description |
Student's academic performance is a prime concern to high level educational institution since it will react the performance of the institution. The difierences in academic performance among students are topics that has drawn interest of many academic researchers and our society. One of the biggest challenges in universities decision making and planning today is to predict the performance of their students at the early stage prior to their admission. We address the application of inferring the degree classification of students using their background data in the dataset obtained from one of the high level educational institutions in Malaysia. We present the results of a detailed statistical analysis relating to the final degree classification obtained at the end of their studies and their backgrounds. Classification tree model produce the highest accuracy in predicting student's degree classification using their background data as compared to Bayesion network and naive Bayes. The significance of the prediction depends closely on the quality of the database and on the chosen sample dataset to be used for model training and testing. Missing values either in predictor or in response variables are a very common problem in statistics and data mining. Cases with missing values are often ignored which results in loss of information and possible bias. Surrogate split in standard classification tree is a possible choice in handling missing values for large dataset contains at most ten percent missing values. However, for dataset contains more than 10 percent missing values, there is an adverse impact on the structure of classification tree and also the accuracy. In this thesis, we propose classification tree with imputation model to handle missing values in dataset. We investigate the application of classification tree, Bayesian network and naive Bayes as the imputation techniques to handle missing values in classification tree model. The investigation includes all three types of missing values machanism; missing completely at random (MCAR), missing at random (MAR) and missing not at random (MNAR). Imputation using classification tree outperform the imputatation using Bayesian network and naive Bayes for all MCAR, MAR and MNAR. We also compare the performance of classification tree with imputation with surrogate splits in classification and regression tree (CART). Fifteen percent of student's background data are eliminated and classification tree with imputation is used to predict student's degree classification. Classification tree with imputation model produces more accurate model as compared to surrogate splits. |
format |
Thesis |
author |
Hasan, Norsida |
spellingShingle |
Hasan, Norsida Modeling students' background and academic performance with missing values using classification tree |
author_facet |
Hasan, Norsida |
author_sort |
Hasan, Norsida |
title |
Modeling students' background and academic performance with missing values using classification tree |
title_short |
Modeling students' background and academic performance with missing values using classification tree |
title_full |
Modeling students' background and academic performance with missing values using classification tree |
title_fullStr |
Modeling students' background and academic performance with missing values using classification tree |
title_full_unstemmed |
Modeling students' background and academic performance with missing values using classification tree |
title_sort |
modeling students' background and academic performance with missing values using classification tree |
publishDate |
2014 |
url |
http://psasir.upm.edu.my/id/eprint/52116/1/IPM%202014%208RRR.pdf http://psasir.upm.edu.my/id/eprint/52116/ |
_version_ |
1643835153326276608 |
score |
13.211869 |