Staff View: Relevance detection and summarizing strategies identification algorithm using linguistic measures / Seyed Asadollah Abdiesfandani

Relevance detection and summarizing strategies identification algorithm using linguistic measures / Seyed Asadollah Abdiesfandani

Summarization is a process to select important information from a source text. Summarizing strategies are the core of the cognitive processes involved in the summarization activity. Summarizing strategies include a set of conscious tasks that are used to determine important information and extrac...

Full description

Saved in:

Bibliographic Details
Main Author:	Seyed Asadollah, Abdiesfandani
Format:	Thesis
Published:	2016
Subjects:	QA75 Electronic computers. Computer science
Online Access:	http://studentsrepo.um.edu.my/6400/4/seyed.pdf http://studentsrepo.um.edu.my/6400/
Tags:	Add Tag No Tags, Be the first to tag this record!

id	my.um.stud.6400
record_format	eprints
spelling	my.um.stud.64002019-10-23T19:06:07Z Relevance detection and summarizing strategies identification algorithm using linguistic measures / Seyed Asadollah Abdiesfandani Seyed Asadollah, Abdiesfandani QA75 Electronic computers. Computer science Summarization is a process to select important information from a source text. Summarizing strategies are the core of the cognitive processes involved in the summarization activity. Summarizing strategies include a set of conscious tasks that are used to determine important information and extract the main idea of a source text. In this research project, we conducted a study on students’ summaries. The findings of the study show that, there is a strong relationship between the summary writing proficiency of students and the summarizing strategies that they used. We then develop a new algorithm to address the summarizing strategies identification problem. The algorithm simulates two important tasks that are frequently used by the human experts to identify summarizing strategies used to produce the summary sentences: 1) sentences relevance identification; and 2) summarizing strategies identification. The sentences relevance identification module uses a statistical based approach such as vector space model (VSM) to represent sentences and compute similarity between the source sentences and the summary sentences using the cosine similarity measure. It then integrates both the semantic and syntactic similarity measures using a linear equation to capture the meaning in comparison between two sentences. It aims to distinguish the meaning of two sentences, when two sentences have same surface or share the similar bag-of-words (BOW), while their meaning is different. The module also employed a word semantic similarity measuring method to overcome vocabulary mismatch problem in sentence comparison. The method bridges the lexical gaps for semantically similar contexts that are expressed in a different wording. In addition, the sentences relevance identification module requires some degree of linguistic pre-processing, including part of speech tagging (POS), word stemming and stop-words removal. iii The summarizing strategies identification module relies on a set of heuristic rules, statistical and linguistic methods such as position-based method, title-based method, cue-phrase method and word-frequency method to identify the summarizing strategies employed by students. To evaluate the algorithm, we conducted two experiments. In the first experiment, we examine the functionality of the system, whether the system is able to identify the summarizing strategies used by students in summary writing. The result for the first experiment shows that the system is able to identify some of summarizing strategies which are deletion, sentence combination, paraphrase and topic sentence selection. The system is also able to detect copy- verbatim strategy, the most commonly strategy used by students. Besides than these strategies, there are four methods used in topic sentence selection strategy which can also be identified by the system. They are 1) cue method; 2) title method; 3) keyword method; and 4) location method. In the second experiment, we want to measure the performance of the algorithm against human judgment to identify the summarizing strategies using the precision, recall, F-measure score and accuracy rate. The experimental results show that the proposed algorithm achieved acceptable results in comparison to human judgment. The algorithm achieved an average of 87% precision, 83% of recall, 85% of F-score and 82% of accuracy rate. 2016 Thesis NonPeerReviewed application/pdf http://studentsrepo.um.edu.my/6400/4/seyed.pdf Seyed Asadollah, Abdiesfandani (2016) Relevance detection and summarizing strategies identification algorithm using linguistic measures / Seyed Asadollah Abdiesfandani. PhD thesis, University of Malaya. http://studentsrepo.um.edu.my/6400/
institution	Universiti Malaya
building	UM Library
collection	Institutional Repository
continent	Asia
country	Malaysia
content_provider	Universiti Malaya
content_source	UM Student Repository
url_provider	http://studentsrepo.um.edu.my/
topic	QA75 Electronic computers. Computer science
spellingShingle	QA75 Electronic computers. Computer science Seyed Asadollah, Abdiesfandani Relevance detection and summarizing strategies identification algorithm using linguistic measures / Seyed Asadollah Abdiesfandani
description	Summarization is a process to select important information from a source text. Summarizing strategies are the core of the cognitive processes involved in the summarization activity. Summarizing strategies include a set of conscious tasks that are used to determine important information and extract the main idea of a source text. In this research project, we conducted a study on students’ summaries. The findings of the study show that, there is a strong relationship between the summary writing proficiency of students and the summarizing strategies that they used. We then develop a new algorithm to address the summarizing strategies identification problem. The algorithm simulates two important tasks that are frequently used by the human experts to identify summarizing strategies used to produce the summary sentences: 1) sentences relevance identification; and 2) summarizing strategies identification. The sentences relevance identification module uses a statistical based approach such as vector space model (VSM) to represent sentences and compute similarity between the source sentences and the summary sentences using the cosine similarity measure. It then integrates both the semantic and syntactic similarity measures using a linear equation to capture the meaning in comparison between two sentences. It aims to distinguish the meaning of two sentences, when two sentences have same surface or share the similar bag-of-words (BOW), while their meaning is different. The module also employed a word semantic similarity measuring method to overcome vocabulary mismatch problem in sentence comparison. The method bridges the lexical gaps for semantically similar contexts that are expressed in a different wording. In addition, the sentences relevance identification module requires some degree of linguistic pre-processing, including part of speech tagging (POS), word stemming and stop-words removal. iii The summarizing strategies identification module relies on a set of heuristic rules, statistical and linguistic methods such as position-based method, title-based method, cue-phrase method and word-frequency method to identify the summarizing strategies employed by students. To evaluate the algorithm, we conducted two experiments. In the first experiment, we examine the functionality of the system, whether the system is able to identify the summarizing strategies used by students in summary writing. The result for the first experiment shows that the system is able to identify some of summarizing strategies which are deletion, sentence combination, paraphrase and topic sentence selection. The system is also able to detect copy- verbatim strategy, the most commonly strategy used by students. Besides than these strategies, there are four methods used in topic sentence selection strategy which can also be identified by the system. They are 1) cue method; 2) title method; 3) keyword method; and 4) location method. In the second experiment, we want to measure the performance of the algorithm against human judgment to identify the summarizing strategies using the precision, recall, F-measure score and accuracy rate. The experimental results show that the proposed algorithm achieved acceptable results in comparison to human judgment. The algorithm achieved an average of 87% precision, 83% of recall, 85% of F-score and 82% of accuracy rate.
format	Thesis
author	Seyed Asadollah, Abdiesfandani
author_facet	Seyed Asadollah, Abdiesfandani
author_sort	Seyed Asadollah, Abdiesfandani
title	Relevance detection and summarizing strategies identification algorithm using linguistic measures / Seyed Asadollah Abdiesfandani
title_short	Relevance detection and summarizing strategies identification algorithm using linguistic measures / Seyed Asadollah Abdiesfandani
title_full	Relevance detection and summarizing strategies identification algorithm using linguistic measures / Seyed Asadollah Abdiesfandani
title_fullStr	Relevance detection and summarizing strategies identification algorithm using linguistic measures / Seyed Asadollah Abdiesfandani
title_full_unstemmed	Relevance detection and summarizing strategies identification algorithm using linguistic measures / Seyed Asadollah Abdiesfandani
title_sort	relevance detection and summarizing strategies identification algorithm using linguistic measures / seyed asadollah abdiesfandani
publishDate	2016
url	http://studentsrepo.um.edu.my/6400/4/seyed.pdf http://studentsrepo.um.edu.my/6400/
_version_	1738505911296589824
score	13.160551

Relevance detection and summarizing strategies identification algorithm using linguistic measures / Seyed Asadollah Abdiesfandani

Similar Items