Nature inspired data mining algorithm for document clustering in information retrieval

Document clustering is an important technique that has been widely employed in Information Retrieval (IR). Various clustering techniques have been reported, but the effectiveness of most techniques relies on the initial value of k clusters.Such an approach may not be suitable as we may not have prio...

全面介紹

Saved in:
書目詳細資料
Main Authors: Mohammed, Athraa Jasim, Yusof, Yuhanis, Husni, Husniza
其他作者: Ahmad, Azizah
格式: Book Section
出版: Springer International Publishing 2014
主題:
在線閱讀:http://repo.uum.edu.my/18929/
http://doi.org/10.1007/978-3-319-12844-3_33
標簽: 添加標簽
沒有標簽, 成為第一個標記此記錄!
實物特徵
總結:Document clustering is an important technique that has been widely employed in Information Retrieval (IR). Various clustering techniques have been reported, but the effectiveness of most techniques relies on the initial value of k clusters.Such an approach may not be suitable as we may not have prior knowledge on the collection of documents.To date, there are various swarm based clustering techniques proposed to address such problem, including this paper that explores the adaptation of Firefly Algorithm (FA) in document clustering. We extend the work on Gravitation Firefly Algorithm (GFA) by introducing a relocate mechanism that relocates assigned documents, if necessary. The newly proposed clustering algorithm, known as GFA_R, is then tested on a benchmark dataset obtained from the 20Newsgroups. Experimental results on external and relative quality metrics for the GFA_R is compared against the one obtained using the standard GFA and Bisect K-means.It is learned that by extending GFA to becoming GFA_R, a better quality clustering is obtained.