Estimation of Missing Values Using Optimised Hybrid Fuzzy C-Means and Majority Vote for Microarray Data

Missing values are a huge constraint in microarray technologies towards improving and identifying disease-causing genes. Estimating missing values is an undeniable scenario faced by field experts. The imputation method is an effective way to impute the proper values to proceed with the next process...

Full description

Saved in:
Bibliographic Details
Main Authors: Kumaran, Shamini Raja, Othman, Mohd Shahizan, Yusuf, Lizawati Mi
Format: Article
Language:English
Published: Universiti Utara Malaysia Press 2020
Subjects:
Online Access:https://repo.uum.edu.my/id/eprint/28790/1/JICT%2019%2004%202020%20459-482.pdf
https://repo.uum.edu.my/id/eprint/28790/
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.uum.repo.28790
record_format eprints
spelling my.uum.repo.287902022-08-07T03:00:58Z https://repo.uum.edu.my/id/eprint/28790/ Estimation of Missing Values Using Optimised Hybrid Fuzzy C-Means and Majority Vote for Microarray Data Kumaran, Shamini Raja Othman, Mohd Shahizan Yusuf, Lizawati Mi QA75 Electronic computers. Computer science Missing values are a huge constraint in microarray technologies towards improving and identifying disease-causing genes. Estimating missing values is an undeniable scenario faced by field experts. The imputation method is an effective way to impute the proper values to proceed with the next process in microarray technology. Missing value imputation methods may increase the classification accuracy. Although these methods might predict the values, classification accuracy rates prove the ability of the methods to identify the missing values in gene expression data. In this study, a novel method, Optimised Hybrid of Fuzzy C-Means and Majority Vote (opt-FCMMV), was proposed to identify the missing values in the data. Using the Majority Vote (MV) and optimisation through Particle Swarm Optimisation (PSO), this study predicted missing values in the data to form more informative and solid data. In order to verify the effectiveness of opt-FCMMV, several experiments were carried out on two publicly available microarray datasets (i.e. Ovary and Lung Cancer) under three missing value mechanisms with five different percentage values in the biomedical domain using Support Vector Machine (SVM) classifier. The experimental results showed that the proposed method functioned efficiently by showcasing the highest accuracy rate as compared to the one without imputations, with imputation by Fuzzy C-Means (FCM), and imputation by Fuzzy C-Means with Majority Vote (FCMMV). For example, the accuracy rates for Ovary Cancer data with 5% missing values were 64.0% for no imputation, 81.8% (FCM), 90.0% (FCMMV), and 93.7% (opt-FCMMV). Such an outcome indicates that the opt-FCMMV may also be applied in different domains in order to prepare the dataset for various data mining tasks. Universiti Utara Malaysia Press 2020 Article PeerReviewed application/pdf en https://repo.uum.edu.my/id/eprint/28790/1/JICT%2019%2004%202020%20459-482.pdf Kumaran, Shamini Raja and Othman, Mohd Shahizan and Yusuf, Lizawati Mi (2020) Estimation of Missing Values Using Optimised Hybrid Fuzzy C-Means and Majority Vote for Microarray Data. Journal of Information and Communication Technology, 19 (04). pp. 459-482. ISSN 2180-3862
institution Universiti Utara Malaysia
building UUM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Utara Malaysia
content_source UUM Institutional Repository
url_provider http://repo.uum.edu.my/
language English
topic QA75 Electronic computers. Computer science
spellingShingle QA75 Electronic computers. Computer science
Kumaran, Shamini Raja
Othman, Mohd Shahizan
Yusuf, Lizawati Mi
Estimation of Missing Values Using Optimised Hybrid Fuzzy C-Means and Majority Vote for Microarray Data
description Missing values are a huge constraint in microarray technologies towards improving and identifying disease-causing genes. Estimating missing values is an undeniable scenario faced by field experts. The imputation method is an effective way to impute the proper values to proceed with the next process in microarray technology. Missing value imputation methods may increase the classification accuracy. Although these methods might predict the values, classification accuracy rates prove the ability of the methods to identify the missing values in gene expression data. In this study, a novel method, Optimised Hybrid of Fuzzy C-Means and Majority Vote (opt-FCMMV), was proposed to identify the missing values in the data. Using the Majority Vote (MV) and optimisation through Particle Swarm Optimisation (PSO), this study predicted missing values in the data to form more informative and solid data. In order to verify the effectiveness of opt-FCMMV, several experiments were carried out on two publicly available microarray datasets (i.e. Ovary and Lung Cancer) under three missing value mechanisms with five different percentage values in the biomedical domain using Support Vector Machine (SVM) classifier. The experimental results showed that the proposed method functioned efficiently by showcasing the highest accuracy rate as compared to the one without imputations, with imputation by Fuzzy C-Means (FCM), and imputation by Fuzzy C-Means with Majority Vote (FCMMV). For example, the accuracy rates for Ovary Cancer data with 5% missing values were 64.0% for no imputation, 81.8% (FCM), 90.0% (FCMMV), and 93.7% (opt-FCMMV). Such an outcome indicates that the opt-FCMMV may also be applied in different domains in order to prepare the dataset for various data mining tasks.
format Article
author Kumaran, Shamini Raja
Othman, Mohd Shahizan
Yusuf, Lizawati Mi
author_facet Kumaran, Shamini Raja
Othman, Mohd Shahizan
Yusuf, Lizawati Mi
author_sort Kumaran, Shamini Raja
title Estimation of Missing Values Using Optimised Hybrid Fuzzy C-Means and Majority Vote for Microarray Data
title_short Estimation of Missing Values Using Optimised Hybrid Fuzzy C-Means and Majority Vote for Microarray Data
title_full Estimation of Missing Values Using Optimised Hybrid Fuzzy C-Means and Majority Vote for Microarray Data
title_fullStr Estimation of Missing Values Using Optimised Hybrid Fuzzy C-Means and Majority Vote for Microarray Data
title_full_unstemmed Estimation of Missing Values Using Optimised Hybrid Fuzzy C-Means and Majority Vote for Microarray Data
title_sort estimation of missing values using optimised hybrid fuzzy c-means and majority vote for microarray data
publisher Universiti Utara Malaysia Press
publishDate 2020
url https://repo.uum.edu.my/id/eprint/28790/1/JICT%2019%2004%202020%20459-482.pdf
https://repo.uum.edu.my/id/eprint/28790/
_version_ 1740828601770573824
score 13.160551