Classification and regression tree in classifying and predicting students' academic performance

In this study, Classification and Regression Tree (CART) is used to classify and predict student who is likely to pass or fail in the final exam of Engineering Statistic course. However, two problems typical surfaced when applying CART algorithm on highly dimensional data: misclassification error an...

Full description

Saved in:
Bibliographic Details
Main Author: Ho, Su Juih
Format: Thesis
Language:English
Published: 2013
Subjects:
Online Access:http://eprints.utm.my/id/eprint/33100/5/HoSuJuihMFS2013.pdf
http://eprints.utm.my/id/eprint/33100/
http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:69079?site_name=Restricted Repository
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.utm.33100
record_format eprints
spelling my.utm.331002017-09-11T06:25:19Z http://eprints.utm.my/id/eprint/33100/ Classification and regression tree in classifying and predicting students' academic performance Ho, Su Juih QA Mathematics In this study, Classification and Regression Tree (CART) is used to classify and predict student who is likely to pass or fail in the final exam of Engineering Statistic course. However, two problems typical surfaced when applying CART algorithm on highly dimensional data: misclassification error and overfitting problem. Thus this research aims to reduce its misclassification error and overfitting problem for better accuracy in prediction and classification. Based on this study, different data proportion such as re-substitution method, hold-out method and 10-fold cross validation method are used for building and evaluating the decision tree. The results are compared in terms of prediction accuracy, sensitivity and specificity as well as tree structures. Based on the results obtained, 10-fold cross validation achieves the highest prediction accuracy (least misclassification error) of 85.11%. Hence, it is selected for further overfitting analysis by conducting error rate plot and cost complexity pruning methods in order to reduce the misclassification error. From the results obtained, the final pruned tree has shown to improve the prediction accuracy (87.23%). We have identified three rules generated from the final tree to identify the relationship of the attributes. Consequently, this study indicates that application of CART algorithm by 10-fold cross validation method can produce a better accuracy in classifying and predicting students? academic performance. In addition, lecturers can use such method to identify students who perform poorly in this course so that actions can be taken to avoid more failures in this course. 2013-01 Thesis NonPeerReviewed application/pdf en http://eprints.utm.my/id/eprint/33100/5/HoSuJuihMFS2013.pdf Ho, Su Juih (2013) Classification and regression tree in classifying and predicting students' academic performance. Masters thesis, Universiti Teknologi Malaysia, Faculty of Science. http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:69079?site_name=Restricted Repository
institution Universiti Teknologi Malaysia
building UTM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Teknologi Malaysia
content_source UTM Institutional Repository
url_provider http://eprints.utm.my/
language English
topic QA Mathematics
spellingShingle QA Mathematics
Ho, Su Juih
Classification and regression tree in classifying and predicting students' academic performance
description In this study, Classification and Regression Tree (CART) is used to classify and predict student who is likely to pass or fail in the final exam of Engineering Statistic course. However, two problems typical surfaced when applying CART algorithm on highly dimensional data: misclassification error and overfitting problem. Thus this research aims to reduce its misclassification error and overfitting problem for better accuracy in prediction and classification. Based on this study, different data proportion such as re-substitution method, hold-out method and 10-fold cross validation method are used for building and evaluating the decision tree. The results are compared in terms of prediction accuracy, sensitivity and specificity as well as tree structures. Based on the results obtained, 10-fold cross validation achieves the highest prediction accuracy (least misclassification error) of 85.11%. Hence, it is selected for further overfitting analysis by conducting error rate plot and cost complexity pruning methods in order to reduce the misclassification error. From the results obtained, the final pruned tree has shown to improve the prediction accuracy (87.23%). We have identified three rules generated from the final tree to identify the relationship of the attributes. Consequently, this study indicates that application of CART algorithm by 10-fold cross validation method can produce a better accuracy in classifying and predicting students? academic performance. In addition, lecturers can use such method to identify students who perform poorly in this course so that actions can be taken to avoid more failures in this course.
format Thesis
author Ho, Su Juih
author_facet Ho, Su Juih
author_sort Ho, Su Juih
title Classification and regression tree in classifying and predicting students' academic performance
title_short Classification and regression tree in classifying and predicting students' academic performance
title_full Classification and regression tree in classifying and predicting students' academic performance
title_fullStr Classification and regression tree in classifying and predicting students' academic performance
title_full_unstemmed Classification and regression tree in classifying and predicting students' academic performance
title_sort classification and regression tree in classifying and predicting students' academic performance
publishDate 2013
url http://eprints.utm.my/id/eprint/33100/5/HoSuJuihMFS2013.pdf
http://eprints.utm.my/id/eprint/33100/
http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:69079?site_name=Restricted Repository
_version_ 1643649230436302848
score 13.159267