Staff View: Bilingual Extractive Text Summarization Model using Textual Pattern Constraints

Bilingual Extractive Text Summarization Model using Textual Pattern Constraints

In the era of digital information, an auto-generated summary can help readers to easily find important and relevant information. Most of the studies and benchmark data sets in the field of text summarization are in English. Hence, there is a need to study the potential of Malay language in this fiel...

Full description

Saved in:

Bibliographic Details
Main Authors:	Suraya Alias, Mohd Shamrie Sainin, Siti Khaotijah Mohammad
Format:	Article
Language:	English English
Published:	GEMA Online 2020
Subjects:	T Technology (General)
Online Access:	https://eprints.ums.edu.my/id/eprint/26542/1/Bilingual%20Extractive%20Text%20Summarization%20Model%20using%20Textual%20Pattern%20Constraints%20.pdf https://eprints.ums.edu.my/id/eprint/26542/2/Bilingual%20Extractive%20Text%20Summarization%20Model%20using%20Textual%20Pattern%20Constraints%201.pdf https://eprints.ums.edu.my/id/eprint/26542/ http://doi.org/10.17576/gema-2020-2003-05
Tags:	Add Tag No Tags, Be the first to tag this record!

id	my.ums.eprints.26542
record_format	eprints
spelling	my.ums.eprints.265422020-12-21T08:49:13Z https://eprints.ums.edu.my/id/eprint/26542/ Bilingual Extractive Text Summarization Model using Textual Pattern Constraints Suraya Alias Mohd Shamrie Sainin Siti Khaotijah Mohammad T Technology (General) In the era of digital information, an auto-generated summary can help readers to easily find important and relevant information. Most of the studies and benchmark data sets in the field of text summarization are in English. Hence, there is a need to study the potential of Malay language in this field. This study also highlights the problems in identifying and generating important information in extractive summaries. This is because existing text representation models such as BOW has weaknesses in inaccurate semantic representation, while the N-gram model has the issue of producing very high word vector dimensions. In this study, a bilingual text summarization model named MYTextSumBASIC has been developed to generate an extractive summary automatically in Malay and English. The MYTextSumBASIC summarizer model applies a text representation model known as FASP using three Textual Pattern Constraints, namely word item constraints, adjacent word constraints and sequence size constraints. There are three main phases in the framework of MYTextSumBASIC model, which are the development of the Malay language corpus, the development of MYTextSumBASIC model using FASP and the summary evaluation phase. In the summary evaluation phase, using the Malay language data sets of 100 news articles, the summaries produced by MYTextSumBASIC outperformed the summary generated by Baseline (Lead) and OTS summarizer with the highest average for retrieval (R) is 0.5849, precision (P) is 0.5736 and the F-score (Fm) is 0.5772. For manual evaluation by linguists, the MYTextSumBASIC method yielded a reading score of 4.1 and 3.87 for summary content generated using a random data set. Further experiments using the 2002 DUC English benchmark data set of 102 news articles have also shown that the MYTextSumBASIC model outperformed the best and lowest systems in the comparison with the mean retrieval values of ROUGE-1 (0.43896) and ROUGE-2 (0.19918). These findings conclude that the FASP text representation feature along with the textual pattern constraints used by our model can be used for bilingual text with competitive performance compared to other text summarization models. GEMA Online 2020 Article PeerReviewed text en https://eprints.ums.edu.my/id/eprint/26542/1/Bilingual%20Extractive%20Text%20Summarization%20Model%20using%20Textual%20Pattern%20Constraints%20.pdf text en https://eprints.ums.edu.my/id/eprint/26542/2/Bilingual%20Extractive%20Text%20Summarization%20Model%20using%20Textual%20Pattern%20Constraints%201.pdf Suraya Alias and Mohd Shamrie Sainin and Siti Khaotijah Mohammad (2020) Bilingual Extractive Text Summarization Model using Textual Pattern Constraints. Journal of Language Studies, 20 (3). pp. 70-95. ISSN 2550-2131 http://doi.org/10.17576/gema-2020-2003-05
institution	Universiti Malaysia Sabah
building	UMS Library
collection	Institutional Repository
continent	Asia
country	Malaysia
content_provider	Universiti Malaysia Sabah
content_source	UMS Institutional Repository
url_provider	http://eprints.ums.edu.my/
language	English English
topic	T Technology (General)
spellingShingle	T Technology (General) Suraya Alias Mohd Shamrie Sainin Siti Khaotijah Mohammad Bilingual Extractive Text Summarization Model using Textual Pattern Constraints
description	In the era of digital information, an auto-generated summary can help readers to easily find important and relevant information. Most of the studies and benchmark data sets in the field of text summarization are in English. Hence, there is a need to study the potential of Malay language in this field. This study also highlights the problems in identifying and generating important information in extractive summaries. This is because existing text representation models such as BOW has weaknesses in inaccurate semantic representation, while the N-gram model has the issue of producing very high word vector dimensions. In this study, a bilingual text summarization model named MYTextSumBASIC has been developed to generate an extractive summary automatically in Malay and English. The MYTextSumBASIC summarizer model applies a text representation model known as FASP using three Textual Pattern Constraints, namely word item constraints, adjacent word constraints and sequence size constraints. There are three main phases in the framework of MYTextSumBASIC model, which are the development of the Malay language corpus, the development of MYTextSumBASIC model using FASP and the summary evaluation phase. In the summary evaluation phase, using the Malay language data sets of 100 news articles, the summaries produced by MYTextSumBASIC outperformed the summary generated by Baseline (Lead) and OTS summarizer with the highest average for retrieval (R) is 0.5849, precision (P) is 0.5736 and the F-score (Fm) is 0.5772. For manual evaluation by linguists, the MYTextSumBASIC method yielded a reading score of 4.1 and 3.87 for summary content generated using a random data set. Further experiments using the 2002 DUC English benchmark data set of 102 news articles have also shown that the MYTextSumBASIC model outperformed the best and lowest systems in the comparison with the mean retrieval values of ROUGE-1 (0.43896) and ROUGE-2 (0.19918). These findings conclude that the FASP text representation feature along with the textual pattern constraints used by our model can be used for bilingual text with competitive performance compared to other text summarization models.
format	Article
author	Suraya Alias Mohd Shamrie Sainin Siti Khaotijah Mohammad
author_facet	Suraya Alias Mohd Shamrie Sainin Siti Khaotijah Mohammad
author_sort	Suraya Alias
title	Bilingual Extractive Text Summarization Model using Textual Pattern Constraints
title_short	Bilingual Extractive Text Summarization Model using Textual Pattern Constraints
title_full	Bilingual Extractive Text Summarization Model using Textual Pattern Constraints
title_fullStr	Bilingual Extractive Text Summarization Model using Textual Pattern Constraints
title_full_unstemmed	Bilingual Extractive Text Summarization Model using Textual Pattern Constraints
title_sort	bilingual extractive text summarization model using textual pattern constraints
publisher	GEMA Online
publishDate	2020
url	https://eprints.ums.edu.my/id/eprint/26542/1/Bilingual%20Extractive%20Text%20Summarization%20Model%20using%20Textual%20Pattern%20Constraints%20.pdf https://eprints.ums.edu.my/id/eprint/26542/2/Bilingual%20Extractive%20Text%20Summarization%20Model%20using%20Textual%20Pattern%20Constraints%201.pdf https://eprints.ums.edu.my/id/eprint/26542/ http://doi.org/10.17576/gema-2020-2003-05
_version_	1760230512314548224
score	13.160551

Bilingual Extractive Text Summarization Model using Textual Pattern Constraints

Similar Items