Staff View: Optimized tree-classification algorithm for classification of protein sequences

Optimized tree-classification algorithm for classification of protein sequences

Computational intelligence is an ongoing area of research, which has been successfully utilized in the analysis and modeling of the tremendous amount of biological data accumulated under different high throughput genome sequencing projects. The data gathered is mainly comprised of DNA, RNA and prote...

Full description

Saved in:

Bibliographic Details
Main Authors:	Iqbal, M.J., Faye, I., Said, A.M., Belhaouari Samir, B.
Format:	Conference or Workshop Item
Published:	Institute of Electrical and Electronics Engineers Inc. 2016
Online Access:	https://www.scopus.com/inward/record.uri?eid=2-s2.0-84995663140&doi=10.1109%2fISMSC.2015.7594037&partnerID=40&md5=82f12a5f8cb95a57d0a703c47bcf0f8f http://eprints.utp.edu.my/30802/
Tags:	Add Tag No Tags, Be the first to tag this record!

id	my.utp.eprints.30802
record_format	eprints
spelling	my.utp.eprints.308022022-03-25T07:33:32Z Optimized tree-classification algorithm for classification of protein sequences Iqbal, M.J. Faye, I. Said, A.M. Belhaouari Samir, B. Computational intelligence is an ongoing area of research, which has been successfully utilized in the analysis and modeling of the tremendous amount of biological data accumulated under different high throughput genome sequencing projects. The data gathered is mainly comprised of DNA, RNA and protein sequences, which are imprecise, incomplete and increasing exponentially. Classification of protein sequences into different superfamilies could be helpful for knowing the structure/function or hidden characteristics of an unknown protein sequence. The problem of classifying protein sequences based on the primary sequence information is a very complex and challenging task in the analysis and understanding of sequenced data. The existing classification methods are performing well on a very limited data; however the rapid increase in the genomic data leads to the development of improved computational methods. In this work, we have proposed an optimized tree-classification technique which uses cluster k nearest neighbor classification algorithm to classify protein sequences into superfamilies. The proposed technique is alignment free and the experimental results reveal that it outperforms than the previous state-of-the-art approaches. The overall best classification accuracy achieved is 97-98 on the previously utilized dataset, which is taken from the well-known UniProtKB database. Â© 2015 IEEE. Institute of Electrical and Electronics Engineers Inc. 2016 Conference or Workshop Item NonPeerReviewed https://www.scopus.com/inward/record.uri?eid=2-s2.0-84995663140&doi=10.1109%2fISMSC.2015.7594037&partnerID=40&md5=82f12a5f8cb95a57d0a703c47bcf0f8f Iqbal, M.J. and Faye, I. and Said, A.M. and Belhaouari Samir, B. (2016) Optimized tree-classification algorithm for classification of protein sequences. In: UNSPECIFIED. http://eprints.utp.edu.my/30802/
institution	Universiti Teknologi Petronas
building	UTP Resource Centre
collection	Institutional Repository
continent	Asia
country	Malaysia
content_provider	Universiti Teknologi Petronas
content_source	UTP Institutional Repository
url_provider	http://eprints.utp.edu.my/
description	Computational intelligence is an ongoing area of research, which has been successfully utilized in the analysis and modeling of the tremendous amount of biological data accumulated under different high throughput genome sequencing projects. The data gathered is mainly comprised of DNA, RNA and protein sequences, which are imprecise, incomplete and increasing exponentially. Classification of protein sequences into different superfamilies could be helpful for knowing the structure/function or hidden characteristics of an unknown protein sequence. The problem of classifying protein sequences based on the primary sequence information is a very complex and challenging task in the analysis and understanding of sequenced data. The existing classification methods are performing well on a very limited data; however the rapid increase in the genomic data leads to the development of improved computational methods. In this work, we have proposed an optimized tree-classification technique which uses cluster k nearest neighbor classification algorithm to classify protein sequences into superfamilies. The proposed technique is alignment free and the experimental results reveal that it outperforms than the previous state-of-the-art approaches. The overall best classification accuracy achieved is 97-98 on the previously utilized dataset, which is taken from the well-known UniProtKB database. Â© 2015 IEEE.
format	Conference or Workshop Item
author	Iqbal, M.J. Faye, I. Said, A.M. Belhaouari Samir, B.
spellingShingle	Iqbal, M.J. Faye, I. Said, A.M. Belhaouari Samir, B. Optimized tree-classification algorithm for classification of protein sequences
author_facet	Iqbal, M.J. Faye, I. Said, A.M. Belhaouari Samir, B.
author_sort	Iqbal, M.J.
title	Optimized tree-classification algorithm for classification of protein sequences
title_short	Optimized tree-classification algorithm for classification of protein sequences
title_full	Optimized tree-classification algorithm for classification of protein sequences
title_fullStr	Optimized tree-classification algorithm for classification of protein sequences
title_full_unstemmed	Optimized tree-classification algorithm for classification of protein sequences
title_sort	optimized tree-classification algorithm for classification of protein sequences
publisher	Institute of Electrical and Electronics Engineers Inc.
publishDate	2016
url	https://www.scopus.com/inward/record.uri?eid=2-s2.0-84995663140&doi=10.1109%2fISMSC.2015.7594037&partnerID=40&md5=82f12a5f8cb95a57d0a703c47bcf0f8f http://eprints.utp.edu.my/30802/
_version_	1738657158903365632
score	13.244109

Optimized tree-classification algorithm for classification of protein sequences

Similar Items