Staff View: Supervised optimal decision machine learning approach to class- and method-level data preprocessing towards effective software defect prediction / Ebubeogu Amarachukwu Felix

Supervised optimal decision machine learning approach to class- and method-level data preprocessing towards effective software defect prediction / Ebubeogu Amarachukwu Felix

Software defect prediction provides actionable outputs to software teams while contributing to industrial success. Therefore, predicting the number of defects in a new version of software at both the class and method levels is an important goal of defect prediction studies to assist software team...

Full description

Saved in:

Bibliographic Details
Main Author:	Ebubeogu Amarachukwu , Felix
Format:	Thesis
Published:	2020
Subjects:	QA76 Computer software TA Engineering (General). Civil engineering (General)
Online Access:	http://studentsrepo.um.edu.my/14571/2/Ebubegogu.pdf http://studentsrepo.um.edu.my/14571/1/Ebubeogu.pdf http://studentsrepo.um.edu.my/14571/
Tags:	Add Tag No Tags, Be the first to tag this record!

id	my.um.stud.14571
record_format	eprints
spelling	my.um.stud.145712023-07-04T23:29:19Z Supervised optimal decision machine learning approach to class- and method-level data preprocessing towards effective software defect prediction / Ebubeogu Amarachukwu Felix Ebubeogu Amarachukwu , Felix QA76 Computer software TA Engineering (General). Civil engineering (General) Software defect prediction provides actionable outputs to software teams while contributing to industrial success. Therefore, predicting the number of defects in a new version of software at both the class and method levels is an important goal of defect prediction studies to assist software teams in optimizing their test efforts towards improving software quality. However, despite remarkable achievements in defect prediction, the quality of the data applied in defect prediction studies has been a major concern, with related quality issues leading to numerous contradictory findings in machine learning research. In addition, a demonstrated approach for predicting the number of defects in a new software version is lacking. Therefore, efforts are required to demonstrate how class- and method-level defect prediction can be achieved for a new software version and to develop an approach for preprocessing the highly imbalanced class- and method-level data available for software defect prediction. To address these issues, first, a data preprocessing framework is proposed to overcome some of the challenges associated with typical software datasets, for instance, irrelevant and redundant features. A machine-learning-driven, supervised optimal decision procedure is followed in the development of this data preprocessing framework, resulting in a prime advantage of bias-free method- and class-level datasets. Second, a method of predicting the number of software defects in an upcoming product release is proposed using predictor variables derived from the defect acceleration observed based on the existing software defects, namely, the defect density, defect velocity and defect introduction time. The number of defects in the current version of a software product is characterized by this defect acceleration; hence, these derived predictor variables can be used to construct regression models to predict the number of software defects in a new version. An experiment conducted on 69 open-source ELFF Java projects, containing 131,034 classes and 289,132 methods, as well as on the NASA datasets, which contain 10 different Java and C++ projects with 22,838 classes, is reported. To evaluate the effectiveness of the proposed framework for data preprocessing, the average classification performances of six selected state-of-the-art classifiers before and after data preprocessing are investigated and compared across multiple projects with data imbalances between the defective and defect-free classes. For both the class and method levels, these selected state-of-the-art classifiers, namely, naïve Bayes, logistic regression, neural network, K-nearest neighbors, support vector machine and random forest classifiers, achieve noteworthy performance when applied to preprocessed datasets. Moreover, for the ELFF projects, the results at the class and method levels respectively show correlation coefficients of 61% and 60% for the defect density, -11% and -4% for the defect introduction time, and 94% and 93% for the defect velocity (consistent results are also obtained for the NASA datasets, as presented in the results section). The proposed approach can serve as a blueprint for program testing to enhance the effectiveness of software development activities. 2020-05 Thesis NonPeerReviewed application/pdf http://studentsrepo.um.edu.my/14571/2/Ebubegogu.pdf application/pdf http://studentsrepo.um.edu.my/14571/1/Ebubeogu.pdf Ebubeogu Amarachukwu , Felix (2020) Supervised optimal decision machine learning approach to class- and method-level data preprocessing towards effective software defect prediction / Ebubeogu Amarachukwu Felix. PhD thesis, Universiti Malaya. http://studentsrepo.um.edu.my/14571/
institution	Universiti Malaya
building	UM Library
collection	Institutional Repository
continent	Asia
country	Malaysia
content_provider	Universiti Malaya
content_source	UM Student Repository
url_provider	http://studentsrepo.um.edu.my/
topic	QA76 Computer software TA Engineering (General). Civil engineering (General)
spellingShingle	QA76 Computer software TA Engineering (General). Civil engineering (General) Ebubeogu Amarachukwu , Felix Supervised optimal decision machine learning approach to class- and method-level data preprocessing towards effective software defect prediction / Ebubeogu Amarachukwu Felix
description	Software defect prediction provides actionable outputs to software teams while contributing to industrial success. Therefore, predicting the number of defects in a new version of software at both the class and method levels is an important goal of defect prediction studies to assist software teams in optimizing their test efforts towards improving software quality. However, despite remarkable achievements in defect prediction, the quality of the data applied in defect prediction studies has been a major concern, with related quality issues leading to numerous contradictory findings in machine learning research. In addition, a demonstrated approach for predicting the number of defects in a new software version is lacking. Therefore, efforts are required to demonstrate how class- and method-level defect prediction can be achieved for a new software version and to develop an approach for preprocessing the highly imbalanced class- and method-level data available for software defect prediction. To address these issues, first, a data preprocessing framework is proposed to overcome some of the challenges associated with typical software datasets, for instance, irrelevant and redundant features. A machine-learning-driven, supervised optimal decision procedure is followed in the development of this data preprocessing framework, resulting in a prime advantage of bias-free method- and class-level datasets. Second, a method of predicting the number of software defects in an upcoming product release is proposed using predictor variables derived from the defect acceleration observed based on the existing software defects, namely, the defect density, defect velocity and defect introduction time. The number of defects in the current version of a software product is characterized by this defect acceleration; hence, these derived predictor variables can be used to construct regression models to predict the number of software defects in a new version. An experiment conducted on 69 open-source ELFF Java projects, containing 131,034 classes and 289,132 methods, as well as on the NASA datasets, which contain 10 different Java and C++ projects with 22,838 classes, is reported. To evaluate the effectiveness of the proposed framework for data preprocessing, the average classification performances of six selected state-of-the-art classifiers before and after data preprocessing are investigated and compared across multiple projects with data imbalances between the defective and defect-free classes. For both the class and method levels, these selected state-of-the-art classifiers, namely, naïve Bayes, logistic regression, neural network, K-nearest neighbors, support vector machine and random forest classifiers, achieve noteworthy performance when applied to preprocessed datasets. Moreover, for the ELFF projects, the results at the class and method levels respectively show correlation coefficients of 61% and 60% for the defect density, -11% and -4% for the defect introduction time, and 94% and 93% for the defect velocity (consistent results are also obtained for the NASA datasets, as presented in the results section). The proposed approach can serve as a blueprint for program testing to enhance the effectiveness of software development activities.
format	Thesis
author	Ebubeogu Amarachukwu , Felix
author_facet	Ebubeogu Amarachukwu , Felix
author_sort	Ebubeogu Amarachukwu , Felix
title	Supervised optimal decision machine learning approach to class- and method-level data preprocessing towards effective software defect prediction / Ebubeogu Amarachukwu Felix
title_short	Supervised optimal decision machine learning approach to class- and method-level data preprocessing towards effective software defect prediction / Ebubeogu Amarachukwu Felix
title_full	Supervised optimal decision machine learning approach to class- and method-level data preprocessing towards effective software defect prediction / Ebubeogu Amarachukwu Felix
title_fullStr	Supervised optimal decision machine learning approach to class- and method-level data preprocessing towards effective software defect prediction / Ebubeogu Amarachukwu Felix
title_full_unstemmed	Supervised optimal decision machine learning approach to class- and method-level data preprocessing towards effective software defect prediction / Ebubeogu Amarachukwu Felix
title_sort	supervised optimal decision machine learning approach to class- and method-level data preprocessing towards effective software defect prediction / ebubeogu amarachukwu felix
publishDate	2020
url	http://studentsrepo.um.edu.my/14571/2/Ebubegogu.pdf http://studentsrepo.um.edu.my/14571/1/Ebubeogu.pdf http://studentsrepo.um.edu.my/14571/
_version_	1772811929506545664
score	13.211869

Supervised optimal decision machine learning approach to class- and method-level data preprocessing towards effective software defect prediction / Ebubeogu Amarachukwu Felix

Similar Items