Staff View: On the use of fuzzy information retrieval for gauging similarity of arabic documents

On the use of fuzzy information retrieval for gauging similarity of arabic documents

As one of the richest human languages in terms of words constructions and diversity of meanings, judging similarity amongst statements in Arabic documents is complex. In this paper, we present a mechanism for gauging similarity of Arabic documents using fuzzy IR model. Similarity degree of two docum...

Full description

Saved in:

Bibliographic Details
Main Authors:	Mohammed Alzahrani, Salha, Salim, Naomie
Format:	Book Section
Published:	Institute of Electrical and Electronics Engineers 2009
Subjects:	QA75 Electronic computers. Computer science
Online Access:	http://eprints.utm.my/id/eprint/13027/ http://dx.doi.org/10.1109/ICADIWT.2009.5273835
Tags:	Add Tag No Tags, Be the first to tag this record!

id	my.utm.13027
record_format	eprints
spelling	my.utm.130272011-07-14T01:26:30Z http://eprints.utm.my/id/eprint/13027/ On the use of fuzzy information retrieval for gauging similarity of arabic documents Mohammed Alzahrani, Salha Salim, Naomie QA75 Electronic computers. Computer science As one of the richest human languages in terms of words constructions and diversity of meanings, judging similarity amongst statements in Arabic documents is complex. In this paper, we present a mechanism for gauging similarity of Arabic documents using fuzzy IR model. Similarity degree of two documents is the averaged similarity among statements treated as equal although they have been restructured or reworded. We introduced some fuzzy similarity sets such as near duplicate, very similar, similar, slightly similar, dissimilar and very dissimilar. These similarity sets can be implemented as a spectrum of values ranges from 1 (duplicate) and 0 (different). Our corpus collection has been built in which all stop words were removed and nonstop words were stemmed using typical Arabic IR techniques. The corpora has 100 documents with 4477 statements and 54346 non-stop-word, stemmed words in total. Another 15 query documents with 303 statements and 1620 words were specifically constructed for our test. Experimental results show that fuzzy IR can be used to define the extent documents are similar or dissimilar, where similarity can be mapped to one of the proposed fuzzy sets. The performance of our fuzzy IR system, measured in fuzzy precision and fuzzy recall, shows that it outperforms Boolean IR in retrieving more documents that have similar content but with different synonyms. Institute of Electrical and Electronics Engineers 2009 Book Section PeerReviewed Mohammed Alzahrani, Salha and Salim, Naomie (2009) On the use of fuzzy information retrieval for gauging similarity of arabic documents. In: 2nd International Conference on the Applications of Digital Information and Web Technologies, ICADIWT 2009. Institute of Electrical and Electronics Engineers, New York, 539 -544. ISBN 978-142444457-1 http://dx.doi.org/10.1109/ICADIWT.2009.5273835 doi: 10.1109/ICADIWT.2009.5273835
institution	Universiti Teknologi Malaysia
building	UTM Library
collection	Institutional Repository
continent	Asia
country	Malaysia
content_provider	Universiti Teknologi Malaysia
content_source	UTM Institutional Repository
url_provider	http://eprints.utm.my/
topic	QA75 Electronic computers. Computer science
spellingShingle	QA75 Electronic computers. Computer science Mohammed Alzahrani, Salha Salim, Naomie On the use of fuzzy information retrieval for gauging similarity of arabic documents
description	As one of the richest human languages in terms of words constructions and diversity of meanings, judging similarity amongst statements in Arabic documents is complex. In this paper, we present a mechanism for gauging similarity of Arabic documents using fuzzy IR model. Similarity degree of two documents is the averaged similarity among statements treated as equal although they have been restructured or reworded. We introduced some fuzzy similarity sets such as near duplicate, very similar, similar, slightly similar, dissimilar and very dissimilar. These similarity sets can be implemented as a spectrum of values ranges from 1 (duplicate) and 0 (different). Our corpus collection has been built in which all stop words were removed and nonstop words were stemmed using typical Arabic IR techniques. The corpora has 100 documents with 4477 statements and 54346 non-stop-word, stemmed words in total. Another 15 query documents with 303 statements and 1620 words were specifically constructed for our test. Experimental results show that fuzzy IR can be used to define the extent documents are similar or dissimilar, where similarity can be mapped to one of the proposed fuzzy sets. The performance of our fuzzy IR system, measured in fuzzy precision and fuzzy recall, shows that it outperforms Boolean IR in retrieving more documents that have similar content but with different synonyms.
format	Book Section
author	Mohammed Alzahrani, Salha Salim, Naomie
author_facet	Mohammed Alzahrani, Salha Salim, Naomie
author_sort	Mohammed Alzahrani, Salha
title	On the use of fuzzy information retrieval for gauging similarity of arabic documents
title_short	On the use of fuzzy information retrieval for gauging similarity of arabic documents
title_full	On the use of fuzzy information retrieval for gauging similarity of arabic documents
title_fullStr	On the use of fuzzy information retrieval for gauging similarity of arabic documents
title_full_unstemmed	On the use of fuzzy information retrieval for gauging similarity of arabic documents
title_sort	on the use of fuzzy information retrieval for gauging similarity of arabic documents
publisher	Institute of Electrical and Electronics Engineers
publishDate	2009
url	http://eprints.utm.my/id/eprint/13027/ http://dx.doi.org/10.1109/ICADIWT.2009.5273835
_version_	1643646098414239744
score	13.19449

On the use of fuzzy information retrieval for gauging similarity of arabic documents

Similar Items