Staff View: Imputing missing value through ensemble concept based on statistical measures

Imputing missing value through ensemble concept based on statistical measures

Many datasets include missing values in their attributes. Data mining techniques are not applicable in the presence of missing values. So an important step in preprocessing of a data mining task is missing value management. One of the most important categories in missing value management techniques...

Full description

Saved in:

Bibliographic Details
Main Authors:	Jenghara, M. M., Ebrahimpour-Komleh, H., Rezaie, V., Nejatian, S., Parvin, H., Yusof, S. K. S.
Format:	Article
Published:	Springer London 2017
Subjects:	TK Electrical engineering. Electronics Nuclear engineering
Online Access:	http://eprints.utm.my/id/eprint/77178/ https://www.scopus.com/inward/record.uri?eid=2-s2.0-85032035373&doi=10.1007%2fs10115-017-1118-1&partnerID=40&md5=0e077a4f0507f0476bdaf1dfa0a70188
Tags:	Add Tag No Tags, Be the first to tag this record!

id	my.utm.77178
record_format	eprints
spelling	my.utm.771782018-05-31T09:50:29Z http://eprints.utm.my/id/eprint/77178/ Imputing missing value through ensemble concept based on statistical measures Jenghara, M. M. Ebrahimpour-Komleh, H. Rezaie, V. Nejatian, S. Parvin, H. Yusof, S. K. S. TK Electrical engineering. Electronics Nuclear engineering Many datasets include missing values in their attributes. Data mining techniques are not applicable in the presence of missing values. So an important step in preprocessing of a data mining task is missing value management. One of the most important categories in missing value management techniques is missing value imputation. This paper presents a new imputation technique. The proposed imputation technique is based on statistical measurements. The suggested imputation technique employs an ensemble of the estimators built to estimate the missing values based on positive and negative correlated observed attributes separately. Each estimator guesses a value for a missed value based on the average and variance of that feature. The average and variance of the feature are estimated from the non-missed values of that feature. The final consensus value for a missed value is the weighted aggregation of the values estimated by different estimators. The chief weight is attribute correlation, and the slight weight is dependent to kernel function such as kurtosis, skewness, number of involved samples and composition of them. The missing values are deliberately produced randomly at different levels. The experimentations indicate that the suggested technique has a good accuracy in comparison with the classical methods. Springer London 2017 Article PeerReviewed Jenghara, M. M. and Ebrahimpour-Komleh, H. and Rezaie, V. and Nejatian, S. and Parvin, H. and Yusof, S. K. S. (2017) Imputing missing value through ensemble concept based on statistical measures. Knowledge and Information Systems . pp. 1-17. ISSN 0219-1377 (In Press) https://www.scopus.com/inward/record.uri?eid=2-s2.0-85032035373&doi=10.1007%2fs10115-017-1118-1&partnerID=40&md5=0e077a4f0507f0476bdaf1dfa0a70188 DOI:10.1007/s10115-017-1118-1
institution	Universiti Teknologi Malaysia
building	UTM Library
collection	Institutional Repository
continent	Asia
country	Malaysia
content_provider	Universiti Teknologi Malaysia
content_source	UTM Institutional Repository
url_provider	http://eprints.utm.my/
topic	TK Electrical engineering. Electronics Nuclear engineering
spellingShingle	TK Electrical engineering. Electronics Nuclear engineering Jenghara, M. M. Ebrahimpour-Komleh, H. Rezaie, V. Nejatian, S. Parvin, H. Yusof, S. K. S. Imputing missing value through ensemble concept based on statistical measures
description	Many datasets include missing values in their attributes. Data mining techniques are not applicable in the presence of missing values. So an important step in preprocessing of a data mining task is missing value management. One of the most important categories in missing value management techniques is missing value imputation. This paper presents a new imputation technique. The proposed imputation technique is based on statistical measurements. The suggested imputation technique employs an ensemble of the estimators built to estimate the missing values based on positive and negative correlated observed attributes separately. Each estimator guesses a value for a missed value based on the average and variance of that feature. The average and variance of the feature are estimated from the non-missed values of that feature. The final consensus value for a missed value is the weighted aggregation of the values estimated by different estimators. The chief weight is attribute correlation, and the slight weight is dependent to kernel function such as kurtosis, skewness, number of involved samples and composition of them. The missing values are deliberately produced randomly at different levels. The experimentations indicate that the suggested technique has a good accuracy in comparison with the classical methods.
format	Article
author	Jenghara, M. M. Ebrahimpour-Komleh, H. Rezaie, V. Nejatian, S. Parvin, H. Yusof, S. K. S.
author_facet	Jenghara, M. M. Ebrahimpour-Komleh, H. Rezaie, V. Nejatian, S. Parvin, H. Yusof, S. K. S.
author_sort	Jenghara, M. M.
title	Imputing missing value through ensemble concept based on statistical measures
title_short	Imputing missing value through ensemble concept based on statistical measures
title_full	Imputing missing value through ensemble concept based on statistical measures
title_fullStr	Imputing missing value through ensemble concept based on statistical measures
title_full_unstemmed	Imputing missing value through ensemble concept based on statistical measures
title_sort	imputing missing value through ensemble concept based on statistical measures
publisher	Springer London
publishDate	2017
url	http://eprints.utm.my/id/eprint/77178/ https://www.scopus.com/inward/record.uri?eid=2-s2.0-85032035373&doi=10.1007%2fs10115-017-1118-1&partnerID=40&md5=0e077a4f0507f0476bdaf1dfa0a70188
_version_	1643657520170926080
score	13.159267

Imputing missing value through ensemble concept based on statistical measures

Similar Items