Evaluation of missing values imputation methods towards the effectiveness of asset valuation prediction model

Missing values is a common problem found in dataset from any field of research. A data value in a dataset can be missing due to numerous reasons such as non-response items in the interview and survey, equipment malfunction, human error and faulty data transmission. The occurrence of missing values i...

Full description

Saved in:
Bibliographic Details
Main Authors: Mohd Jaya, Mohd Izham, Sidi, Fatimah, Affendey, Lilly Suriani, Ishak, Iskandar, A. Jabar, Marzanah
Format: Conference or Workshop Item
Language:English
Published: Database Technologies and Applications Research Group (DbTA), Faculty of Computer Science and Information Technology, Universiti Putra Malaysia 2019
Online Access:http://psasir.upm.edu.my/id/eprint/75514/1/ISICTMA2019-2.pdf
http://psasir.upm.edu.my/id/eprint/75514/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Missing values is a common problem found in dataset from any field of research. A data value in a dataset can be missing due to numerous reasons such as non-response items in the interview and survey, equipment malfunction, human error and faulty data transmission. The occurrence of missing values in a dataset need to be managed using appropriate imputation methods to estimate the approximate values to replace the missing values. The problem of missing values also led to a data quality problem which then resulted inaccurate decisions. In this work, we compared and evaluated various imputation methods including deletion of records with missing value (DEL), mean values imputation (MEAN), k-Nearest Neighbor (KNN), Predictive Mean Matching (PMM), MissForest and Ontology-based Framework for Financial Decision Making (OFFDM) towards the effectiveness of asset valuation prediction model. In portfolio management, asset valuation prediction model is used to aid the decision making process. Additionally, we adopted MissForest method in the OFFDM which aim to improve the OFFDM. We conducted several experiments using different dataset derived from different imputation methods to measure the accuracy, Root Mean Squared Error (RMSE) and F-measure of the prediction model which being built in Artificial Neural Network (ANN). We found that dataset derived from DEL resulted the lowest accuracy and the highest RMSE. Whereas, the adoption of MissForest method in OFFDM resulted the highest accuracy and second lowest RMSE value. The selection of imputation methods is depended on the severity of the task in hands as each method is different in its complexity and efficiency. Imputation method such as MissForest is efficient but required more computational resources. On the other hand, simpler methods such as DEL is still popular due to its simplicity but less efficient.