Supervised optimal decision machine learning approach to class- and method-level data preprocessing towards effective software defect prediction / Ebubeogu Amarachukwu Felix
Software defect prediction provides actionable outputs to software teams while contributing to industrial success. Therefore, predicting the number of defects in a new version of software at both the class and method levels is an important goal of defect prediction studies to assist software team...
Saved in:
Main Author: | |
---|---|
Format: | Thesis |
Published: |
2020
|
Subjects: | |
Online Access: | http://studentsrepo.um.edu.my/14571/2/Ebubegogu.pdf http://studentsrepo.um.edu.my/14571/1/Ebubeogu.pdf http://studentsrepo.um.edu.my/14571/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
id |
my.um.stud.14571 |
---|---|
record_format |
eprints |
spelling |
my.um.stud.145712023-07-04T23:29:19Z Supervised optimal decision machine learning approach to class- and method-level data preprocessing towards effective software defect prediction / Ebubeogu Amarachukwu Felix Ebubeogu Amarachukwu , Felix QA76 Computer software TA Engineering (General). Civil engineering (General) Software defect prediction provides actionable outputs to software teams while contributing to industrial success. Therefore, predicting the number of defects in a new version of software at both the class and method levels is an important goal of defect prediction studies to assist software teams in optimizing their test efforts towards improving software quality. However, despite remarkable achievements in defect prediction, the quality of the data applied in defect prediction studies has been a major concern, with related quality issues leading to numerous contradictory findings in machine learning research. In addition, a demonstrated approach for predicting the number of defects in a new software version is lacking. Therefore, efforts are required to demonstrate how class- and method-level defect prediction can be achieved for a new software version and to develop an approach for preprocessing the highly imbalanced class- and method-level data available for software defect prediction. To address these issues, first, a data preprocessing framework is proposed to overcome some of the challenges associated with typical software datasets, for instance, irrelevant and redundant features. A machine-learning-driven, supervised optimal decision procedure is followed in the development of this data preprocessing framework, resulting in a prime advantage of bias-free method- and class-level datasets. Second, a method of predicting the number of software defects in an upcoming product release is proposed using predictor variables derived from the defect acceleration observed based on the existing software defects, namely, the defect density, defect velocity and defect introduction time. The number of defects in the current version of a software product is characterized by this defect acceleration; hence, these derived predictor variables can be used to construct regression models to predict the number of software defects in a new version. An experiment conducted on 69 open-source ELFF Java projects, containing 131,034 classes and 289,132 methods, as well as on the NASA datasets, which contain 10 different Java and C++ projects with 22,838 classes, is reported. To evaluate the effectiveness of the proposed framework for data preprocessing, the average classification performances of six selected state-of-the-art classifiers before and after data preprocessing are investigated and compared across multiple projects with data imbalances between the defective and defect-free classes. For both the class and method levels, these selected state-of-the-art classifiers, namely, naïve Bayes, logistic regression, neural network, K-nearest neighbors, support vector machine and random forest classifiers, achieve noteworthy performance when applied to preprocessed datasets. Moreover, for the ELFF projects, the results at the class and method levels respectively show correlation coefficients of 61% and 60% for the defect density, -11% and -4% for the defect introduction time, and 94% and 93% for the defect velocity (consistent results are also obtained for the NASA datasets, as presented in the results section). The proposed approach can serve as a blueprint for program testing to enhance the effectiveness of software development activities. 2020-05 Thesis NonPeerReviewed application/pdf http://studentsrepo.um.edu.my/14571/2/Ebubegogu.pdf application/pdf http://studentsrepo.um.edu.my/14571/1/Ebubeogu.pdf Ebubeogu Amarachukwu , Felix (2020) Supervised optimal decision machine learning approach to class- and method-level data preprocessing towards effective software defect prediction / Ebubeogu Amarachukwu Felix. PhD thesis, Universiti Malaya. http://studentsrepo.um.edu.my/14571/ |
institution |
Universiti Malaya |
building |
UM Library |
collection |
Institutional Repository |
continent |
Asia |
country |
Malaysia |
content_provider |
Universiti Malaya |
content_source |
UM Student Repository |
url_provider |
http://studentsrepo.um.edu.my/ |
topic |
QA76 Computer software TA Engineering (General). Civil engineering (General) |
spellingShingle |
QA76 Computer software TA Engineering (General). Civil engineering (General) Ebubeogu Amarachukwu , Felix Supervised optimal decision machine learning approach to class- and method-level data preprocessing towards effective software defect prediction / Ebubeogu Amarachukwu Felix |
description |
Software defect prediction provides actionable outputs to software teams while contributing
to industrial success. Therefore, predicting the number of defects in a new version
of software at both the class and method levels is an important goal of defect prediction
studies to assist software teams in optimizing their test efforts towards improving software
quality. However, despite remarkable achievements in defect prediction, the quality of the
data applied in defect prediction studies has been a major concern, with related quality
issues leading to numerous contradictory findings in machine learning research. In addition,
a demonstrated approach for predicting the number of defects in a new software version is
lacking. Therefore, efforts are required to demonstrate how class- and method-level defect
prediction can be achieved for a new software version and to develop an approach for
preprocessing the highly imbalanced class- and method-level data available for software
defect prediction. To address these issues, first, a data preprocessing framework is proposed
to overcome some of the challenges associated with typical software datasets, for instance,
irrelevant and redundant features. A machine-learning-driven, supervised optimal decision
procedure is followed in the development of this data preprocessing framework, resulting
in a prime advantage of bias-free method- and class-level datasets. Second, a method of
predicting the number of software defects in an upcoming product release is proposed using
predictor variables derived from the defect acceleration observed based on the existing
software defects, namely, the defect density, defect velocity and defect introduction time. The number of defects in the current version of a software product is characterized by
this defect acceleration; hence, these derived predictor variables can be used to construct
regression models to predict the number of software defects in a new version. An experiment
conducted on 69 open-source ELFF Java projects, containing 131,034 classes
and 289,132 methods, as well as on the NASA datasets, which contain 10 different Java
and C++ projects with 22,838 classes, is reported. To evaluate the effectiveness of the
proposed framework for data preprocessing, the average classification performances of
six selected state-of-the-art classifiers before and after data preprocessing are investigated
and compared across multiple projects with data imbalances between the defective and
defect-free classes. For both the class and method levels, these selected state-of-the-art
classifiers, namely, naïve Bayes, logistic regression, neural network, K-nearest neighbors,
support vector machine and random forest classifiers, achieve noteworthy performance
when applied to preprocessed datasets. Moreover, for the ELFF projects, the results at the
class and method levels respectively show correlation coefficients of 61% and 60% for the
defect density, -11% and -4% for the defect introduction time, and 94% and 93% for the
defect velocity (consistent results are also obtained for the NASA datasets, as presented in
the results section). The proposed approach can serve as a blueprint for program testing to
enhance the effectiveness of software development activities.
|
format |
Thesis |
author |
Ebubeogu Amarachukwu , Felix |
author_facet |
Ebubeogu Amarachukwu , Felix |
author_sort |
Ebubeogu Amarachukwu , Felix |
title |
Supervised optimal decision machine learning approach to class- and method-level data preprocessing towards effective software defect prediction / Ebubeogu Amarachukwu Felix |
title_short |
Supervised optimal decision machine learning approach to class- and method-level data preprocessing towards effective software defect prediction / Ebubeogu Amarachukwu Felix |
title_full |
Supervised optimal decision machine learning approach to class- and method-level data preprocessing towards effective software defect prediction / Ebubeogu Amarachukwu Felix |
title_fullStr |
Supervised optimal decision machine learning approach to class- and method-level data preprocessing towards effective software defect prediction / Ebubeogu Amarachukwu Felix |
title_full_unstemmed |
Supervised optimal decision machine learning approach to class- and method-level data preprocessing towards effective software defect prediction / Ebubeogu Amarachukwu Felix |
title_sort |
supervised optimal decision machine learning approach to class- and method-level data preprocessing towards effective software defect prediction / ebubeogu amarachukwu felix |
publishDate |
2020 |
url |
http://studentsrepo.um.edu.my/14571/2/Ebubegogu.pdf http://studentsrepo.um.edu.my/14571/1/Ebubeogu.pdf http://studentsrepo.um.edu.my/14571/ |
_version_ |
1772811929506545664 |
score |
13.211869 |