Nature inspired data mining algorithm for document clustering in information retrieval

Document clustering is an important technique that has been widely employed in Information Retrieval (IR). Various clustering techniques have been reported, but the effectiveness of most techniques relies on the initial value of k clusters.Such an approach may not be suitable as we may not have prio...

Full description

Saved in:
Bibliographic Details
Main Authors: Mohammed, Athraa Jasim, Yusof, Yuhanis, Husni, Husniza
Other Authors: Ahmad, Azizah
Format: Book Section
Published: Springer International Publishing 2014
Subjects:
Online Access:http://repo.uum.edu.my/18929/
http://doi.org/10.1007/978-3-319-12844-3_33
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.uum.repo.18929
record_format eprints
spelling my.uum.repo.189292016-11-08T02:02:40Z http://repo.uum.edu.my/18929/ Nature inspired data mining algorithm for document clustering in information retrieval Mohammed, Athraa Jasim Yusof, Yuhanis Husni, Husniza QA76 Computer software Document clustering is an important technique that has been widely employed in Information Retrieval (IR). Various clustering techniques have been reported, but the effectiveness of most techniques relies on the initial value of k clusters.Such an approach may not be suitable as we may not have prior knowledge on the collection of documents.To date, there are various swarm based clustering techniques proposed to address such problem, including this paper that explores the adaptation of Firefly Algorithm (FA) in document clustering. We extend the work on Gravitation Firefly Algorithm (GFA) by introducing a relocate mechanism that relocates assigned documents, if necessary. The newly proposed clustering algorithm, known as GFA_R, is then tested on a benchmark dataset obtained from the 20Newsgroups. Experimental results on external and relative quality metrics for the GFA_R is compared against the one obtained using the standard GFA and Bisect K-means.It is learned that by extending GFA to becoming GFA_R, a better quality clustering is obtained. Springer International Publishing Ahmad, Azizah Mohamad Ali, Nazlena Mohd Noah, Shahrul Azman Smeaton, Alan F. Bruza, Peter Abu Bakar, Zainab Jamil, Nursuriati Tengku Sembok, Tengku Mohd 2014 Book Section PeerReviewed Mohammed, Athraa Jasim and Yusof, Yuhanis and Husni, Husniza (2014) Nature inspired data mining algorithm for document clustering in information retrieval. In: Information Retrieval Technology. Springer International Publishing, Switzerland, pp. 382-393. ISBN 978-3-319-12843-6 http://doi.org/10.1007/978-3-319-12844-3_33 doi:10.1007/978-3-319-12844-3_33
institution Universiti Utara Malaysia
building UUM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Utara Malaysia
content_source UUM Institutionali Repository
url_provider http://repo.uum.edu.my/
topic QA76 Computer software
spellingShingle QA76 Computer software
Mohammed, Athraa Jasim
Yusof, Yuhanis
Husni, Husniza
Nature inspired data mining algorithm for document clustering in information retrieval
description Document clustering is an important technique that has been widely employed in Information Retrieval (IR). Various clustering techniques have been reported, but the effectiveness of most techniques relies on the initial value of k clusters.Such an approach may not be suitable as we may not have prior knowledge on the collection of documents.To date, there are various swarm based clustering techniques proposed to address such problem, including this paper that explores the adaptation of Firefly Algorithm (FA) in document clustering. We extend the work on Gravitation Firefly Algorithm (GFA) by introducing a relocate mechanism that relocates assigned documents, if necessary. The newly proposed clustering algorithm, known as GFA_R, is then tested on a benchmark dataset obtained from the 20Newsgroups. Experimental results on external and relative quality metrics for the GFA_R is compared against the one obtained using the standard GFA and Bisect K-means.It is learned that by extending GFA to becoming GFA_R, a better quality clustering is obtained.
author2 Ahmad, Azizah
author_facet Ahmad, Azizah
Mohammed, Athraa Jasim
Yusof, Yuhanis
Husni, Husniza
format Book Section
author Mohammed, Athraa Jasim
Yusof, Yuhanis
Husni, Husniza
author_sort Mohammed, Athraa Jasim
title Nature inspired data mining algorithm for document clustering in information retrieval
title_short Nature inspired data mining algorithm for document clustering in information retrieval
title_full Nature inspired data mining algorithm for document clustering in information retrieval
title_fullStr Nature inspired data mining algorithm for document clustering in information retrieval
title_full_unstemmed Nature inspired data mining algorithm for document clustering in information retrieval
title_sort nature inspired data mining algorithm for document clustering in information retrieval
publisher Springer International Publishing
publishDate 2014
url http://repo.uum.edu.my/18929/
http://doi.org/10.1007/978-3-319-12844-3_33
_version_ 1644282570879270912
score 13.149126