Comparing two corpus-based methods for extracting paraphrases to dictionary-based method

Paraphrase extraction plays an increasingly important role in language-related research and applications in areas such as information retrieval, question answering and automatic machine evaluation. Most of the existing methods extract paraphrases from different types of corpora by using syntactic-ba...

Full description

Saved in:
Bibliographic Details
Main Authors: Ho, Chuk Fong, Azmi Murad, Masrah Azrifah, Abdul Kadir, Rabiah, C. Doraisamy, Shyamala
Format: Article
Language:English
Published: World Scientific Publishing 2011
Online Access:http://psasir.upm.edu.my/id/eprint/22466/1/Comparing%20two%20corpus-based%20methods%20for%20extracting%20paraphrases%20to%20dictionary-based%20method.pdf
http://psasir.upm.edu.my/id/eprint/22466/
http://www.worldscientific.com/doi/abs/10.1142/S1793351X11001225
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Paraphrase extraction plays an increasingly important role in language-related research and applications in areas such as information retrieval, question answering and automatic machine evaluation. Most of the existing methods extract paraphrases from different types of corpora by using syntactic-based approaches. Since a syntactic-based approach relies on the similarity of context to identify and capture paraphrases, other than paraphrases, other terms which tend to appear in a similar context such as loosely related terms and functionally similar yet unrelated terms tend to be extracted. Besides, different types of corpora suffer from different kinds of problems such as limited availability and domain biased. This paper presents a solely semantic-based paraphrase extraction model. This model collects paraphrases from multiple lexical resources and validates those paraphrases semantically in three ways; by computing domain similarity, definition similarity and word similarity. This model is benchmarked with two outstanding syntactic-based approaches. The experimental results from a manual evaluation show that the proposed model outperforms the benchmarks. The results indicate that a semantic-based approach should be applied in paraphrase extraction instead of a syntactic-based approach. The results further suggest that a hybrid of these two approaches should be applied if one targets strictly precise paraphrases.