Classification and regression tree in classifying and predicting students' academic performance
In this study, Classification and Regression Tree (CART) is used to classify and predict student who is likely to pass or fail in the final exam of Engineering Statistic course. However, two problems typical surfaced when applying CART algorithm on highly dimensional data: misclassification error an...
Saved in:
Main Author: | |
---|---|
Format: | Thesis |
Language: | English |
Published: |
2013
|
Subjects: | |
Online Access: | http://eprints.utm.my/id/eprint/33100/5/HoSuJuihMFS2013.pdf http://eprints.utm.my/id/eprint/33100/ http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:69079?site_name=Restricted Repository |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
id |
my.utm.33100 |
---|---|
record_format |
eprints |
spelling |
my.utm.331002017-09-11T06:25:19Z http://eprints.utm.my/id/eprint/33100/ Classification and regression tree in classifying and predicting students' academic performance Ho, Su Juih QA Mathematics In this study, Classification and Regression Tree (CART) is used to classify and predict student who is likely to pass or fail in the final exam of Engineering Statistic course. However, two problems typical surfaced when applying CART algorithm on highly dimensional data: misclassification error and overfitting problem. Thus this research aims to reduce its misclassification error and overfitting problem for better accuracy in prediction and classification. Based on this study, different data proportion such as re-substitution method, hold-out method and 10-fold cross validation method are used for building and evaluating the decision tree. The results are compared in terms of prediction accuracy, sensitivity and specificity as well as tree structures. Based on the results obtained, 10-fold cross validation achieves the highest prediction accuracy (least misclassification error) of 85.11%. Hence, it is selected for further overfitting analysis by conducting error rate plot and cost complexity pruning methods in order to reduce the misclassification error. From the results obtained, the final pruned tree has shown to improve the prediction accuracy (87.23%). We have identified three rules generated from the final tree to identify the relationship of the attributes. Consequently, this study indicates that application of CART algorithm by 10-fold cross validation method can produce a better accuracy in classifying and predicting students? academic performance. In addition, lecturers can use such method to identify students who perform poorly in this course so that actions can be taken to avoid more failures in this course. 2013-01 Thesis NonPeerReviewed application/pdf en http://eprints.utm.my/id/eprint/33100/5/HoSuJuihMFS2013.pdf Ho, Su Juih (2013) Classification and regression tree in classifying and predicting students' academic performance. Masters thesis, Universiti Teknologi Malaysia, Faculty of Science. http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:69079?site_name=Restricted Repository |
institution |
Universiti Teknologi Malaysia |
building |
UTM Library |
collection |
Institutional Repository |
continent |
Asia |
country |
Malaysia |
content_provider |
Universiti Teknologi Malaysia |
content_source |
UTM Institutional Repository |
url_provider |
http://eprints.utm.my/ |
language |
English |
topic |
QA Mathematics |
spellingShingle |
QA Mathematics Ho, Su Juih Classification and regression tree in classifying and predicting students' academic performance |
description |
In this study, Classification and Regression Tree (CART) is used to classify and predict student who is likely to pass or fail in the final exam of Engineering Statistic course. However, two problems typical surfaced when applying CART algorithm on highly dimensional data: misclassification error and overfitting problem. Thus this research aims to reduce its misclassification error and overfitting problem for better accuracy in prediction and classification. Based on this study, different data proportion such as re-substitution method, hold-out method and 10-fold cross validation method are used for building and evaluating the decision tree. The results are compared in terms of prediction accuracy, sensitivity and specificity as well as tree structures. Based on the results obtained, 10-fold cross validation achieves the highest prediction accuracy (least misclassification error) of 85.11%. Hence, it is selected for further overfitting analysis by conducting error rate plot and cost complexity pruning methods in order to reduce the misclassification error. From the results obtained, the final pruned tree has shown to improve the prediction accuracy (87.23%). We have identified three rules generated from the final tree to identify the relationship of the attributes. Consequently, this study indicates that application of CART algorithm by 10-fold cross validation method can produce a better accuracy in classifying and predicting students? academic performance. In addition, lecturers can use such method to identify students who perform poorly in this course so that actions can be taken to avoid more failures in this course. |
format |
Thesis |
author |
Ho, Su Juih |
author_facet |
Ho, Su Juih |
author_sort |
Ho, Su Juih |
title |
Classification and regression tree in classifying and predicting students' academic performance |
title_short |
Classification and regression tree in classifying and predicting students' academic performance |
title_full |
Classification and regression tree in classifying and predicting students' academic performance |
title_fullStr |
Classification and regression tree in classifying and predicting students' academic performance |
title_full_unstemmed |
Classification and regression tree in classifying and predicting students' academic performance |
title_sort |
classification and regression tree in classifying and predicting students' academic performance |
publishDate |
2013 |
url |
http://eprints.utm.my/id/eprint/33100/5/HoSuJuihMFS2013.pdf http://eprints.utm.my/id/eprint/33100/ http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:69079?site_name=Restricted Repository |
_version_ |
1643649230436302848 |
score |
13.159267 |