Staff View: Web based cross language semantic plagiarism detectio

Web based cross language semantic plagiarism detectio

Recently, cross language and semantic plagiarism are on the rise. Many plagiarism detection tools are not capable to detect such plagiarism cases. In this research, we propose a new framework which involves summarization, cross language and semantic plagiarism detection. We consider Bahasa Melayu as...

Full description

Saved in:

Bibliographic Details
Main Author:	Chow, Kok Kent
Format:	Thesis
Published:	2013
Subjects:	QA76 Computer software
Online Access:	http://eprints.utm.my/id/eprint/42237/ http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:77839
Tags:	Add Tag No Tags, Be the first to tag this record!

id	my.utm.42237
record_format	eprints
spelling	my.utm.422372020-08-19T08:11:49Z http://eprints.utm.my/id/eprint/42237/ Web based cross language semantic plagiarism detectio Chow, Kok Kent QA76 Computer software Recently, cross language and semantic plagiarism are on the rise. Many plagiarism detection tools are not capable to detect such plagiarism cases. In this research, we propose a new framework which involves summarization, cross language and semantic plagiarism detection. We consider Bahasa Melayu as the input language of the submitted document and English as the language of, possibly plagiarised documents. In this framework we shorten the query document by utilising fuzzy swarm-based summarisation approach. With this summarisation approach, sentences are chosen based on their importance level that determined by five predefined sentence features, which integrated with fuzzy logic. This technique is chosen for its effectiveness achieved in previous research. Input summary documents are translated into English using Google Translate Application Programming Interface (API) before the words are stemmed and the stop words are removed. Tokenized documents are sent to the Google AJAX Search API to detect similar documents throughout the World Wide Web. We integrate the use of Stanford Parser and WordNet to determine the semantic similarity level between the suspected documents and candidate source documents. Stanford parser assigns each terms in the sentence to their corresponding roles such as nouns, verbs and adjectives. Based on these roles, we represent each sentence in a predicate form and similarity is measured based on those predicates using information content value from WordNet taxonomy. The testing dataset is built up from two sets of Malay documents which are produced based on different plagiarism practices. The result of our proposed semantic based similarity measurement shows that it can achieve higher precision, recall and f-measure compared to the conventional Longest Common Subsequence (LCS) approach, which determines similarity between sentences based on their common subsequence from left to right with maximum length, regardless of their consecutive arrangement. 2013 Thesis NonPeerReviewed Chow, Kok Kent (2013) Web based cross language semantic plagiarism detectio. Masters thesis, Universiti Teknologi Malaysia, Faculty of Computing. http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:77839
institution	Universiti Teknologi Malaysia
building	UTM Library
collection	Institutional Repository
continent	Asia
country	Malaysia
content_provider	Universiti Teknologi Malaysia
content_source	UTM Institutional Repository
url_provider	http://eprints.utm.my/
topic	QA76 Computer software
spellingShingle	QA76 Computer software Chow, Kok Kent Web based cross language semantic plagiarism detectio
description	Recently, cross language and semantic plagiarism are on the rise. Many plagiarism detection tools are not capable to detect such plagiarism cases. In this research, we propose a new framework which involves summarization, cross language and semantic plagiarism detection. We consider Bahasa Melayu as the input language of the submitted document and English as the language of, possibly plagiarised documents. In this framework we shorten the query document by utilising fuzzy swarm-based summarisation approach. With this summarisation approach, sentences are chosen based on their importance level that determined by five predefined sentence features, which integrated with fuzzy logic. This technique is chosen for its effectiveness achieved in previous research. Input summary documents are translated into English using Google Translate Application Programming Interface (API) before the words are stemmed and the stop words are removed. Tokenized documents are sent to the Google AJAX Search API to detect similar documents throughout the World Wide Web. We integrate the use of Stanford Parser and WordNet to determine the semantic similarity level between the suspected documents and candidate source documents. Stanford parser assigns each terms in the sentence to their corresponding roles such as nouns, verbs and adjectives. Based on these roles, we represent each sentence in a predicate form and similarity is measured based on those predicates using information content value from WordNet taxonomy. The testing dataset is built up from two sets of Malay documents which are produced based on different plagiarism practices. The result of our proposed semantic based similarity measurement shows that it can achieve higher precision, recall and f-measure compared to the conventional Longest Common Subsequence (LCS) approach, which determines similarity between sentences based on their common subsequence from left to right with maximum length, regardless of their consecutive arrangement.
format	Thesis
author	Chow, Kok Kent
author_facet	Chow, Kok Kent
author_sort	Chow, Kok Kent
title	Web based cross language semantic plagiarism detectio
title_short	Web based cross language semantic plagiarism detectio
title_full	Web based cross language semantic plagiarism detectio
title_fullStr	Web based cross language semantic plagiarism detectio
title_full_unstemmed	Web based cross language semantic plagiarism detectio
title_sort	web based cross language semantic plagiarism detectio
publishDate	2013
url	http://eprints.utm.my/id/eprint/42237/ http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:77839
_version_	1677781071382446080
score	13.188404

Web based cross language semantic plagiarism detectio

Similar Items