Automatic multi-lingual script recognition application

Document Image Analysis and Recognition (DIAR) technique is used to recognize text component and translate it into editable format. Scripts are a set of graphical representations used to express a particular writing system as well as subsets belonging to a particular writing system. The writing styl...

Full description

Saved in:
Bibliographic Details
Main Authors: Abdel Karim Abu-Ain, Waleed, Siti Norul Huda Sheikh Abdullah,, Khairuddin Omar,, Siti Zaharah Abd. Rahman,
Format: Article
Language:English
Published: Penerbit Universiti Kebangsaan Malaysia 2018
Online Access:http://journalarticle.ukm.my/17617/1/26556-83139-1-PB.pdf
http://journalarticle.ukm.my/17617/
https://ejournal.ukm.my/gema/issue/view/1098
Tags: Add Tag
No Tags, Be the first to tag this record!
id my-ukm.journal.17617
record_format eprints
spelling my-ukm.journal.176172021-11-22T06:21:58Z http://journalarticle.ukm.my/17617/ Automatic multi-lingual script recognition application Abdel Karim Abu-Ain, Waleed Siti Norul Huda Sheikh Abdullah, Khairuddin Omar, Siti Zaharah Abd. Rahman, Document Image Analysis and Recognition (DIAR) technique is used to recognize text component and translate it into editable format. Scripts are a set of graphical representations used to express a particular writing system as well as subsets belonging to a particular writing system. The writing styles of more than one script family may then be adopted by one language, such as in the cases where the old Malay language (Jawi) adopts the Arabic script while the modern one adopts the Roman script. The seven major scripts used in this research are in handwritten style including Arabic, Devanagari, Hebrew, Thai, Greek, Cyrillic and Korean. Automatic Multi-lingual Script Recognition (AMSR) is one of the main challenges in DIAR domain. Currently, only few attempts have been made for automated script identification of off-line handwritten documents images. Most available AMSR applications only deal with printed documents and script types, and they neglect handwritten and multi-lingual documents. The objective of this study is to propose a multi-lingual AMSR framework. The research methodology consists of a proposed multilingual AMSR framework. The multilingual AMSR framework is tested on Multilingual-HW datasets, which contains more than seven international unconstraint handwritten scripts, using Grey-Level Co-occurrence Matrix and Local Binary Pattern. The average accuracy of both methods is about 97.01% and 85.29% respectively. This proposed multilingual AMSR is hoped to be beneficial to a group of community which requires automatic sorting multi-lingual documents. This research can also be extended to document forensic area or international relations agency to identify unknown native document. Penerbit Universiti Kebangsaan Malaysia 2018-08 Article PeerReviewed application/pdf en http://journalarticle.ukm.my/17617/1/26556-83139-1-PB.pdf Abdel Karim Abu-Ain, Waleed and Siti Norul Huda Sheikh Abdullah, and Khairuddin Omar, and Siti Zaharah Abd. Rahman, (2018) Automatic multi-lingual script recognition application. GEMA ; Online Journal of Language Studies, 18 (3). pp. 203-221. ISSN 1675-8021 https://ejournal.ukm.my/gema/issue/view/1098
institution Universiti Kebangsaan Malaysia
building Tun Sri Lanang Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Kebangsaan Malaysia
content_source UKM Journal Article Repository
url_provider http://journalarticle.ukm.my/
language English
description Document Image Analysis and Recognition (DIAR) technique is used to recognize text component and translate it into editable format. Scripts are a set of graphical representations used to express a particular writing system as well as subsets belonging to a particular writing system. The writing styles of more than one script family may then be adopted by one language, such as in the cases where the old Malay language (Jawi) adopts the Arabic script while the modern one adopts the Roman script. The seven major scripts used in this research are in handwritten style including Arabic, Devanagari, Hebrew, Thai, Greek, Cyrillic and Korean. Automatic Multi-lingual Script Recognition (AMSR) is one of the main challenges in DIAR domain. Currently, only few attempts have been made for automated script identification of off-line handwritten documents images. Most available AMSR applications only deal with printed documents and script types, and they neglect handwritten and multi-lingual documents. The objective of this study is to propose a multi-lingual AMSR framework. The research methodology consists of a proposed multilingual AMSR framework. The multilingual AMSR framework is tested on Multilingual-HW datasets, which contains more than seven international unconstraint handwritten scripts, using Grey-Level Co-occurrence Matrix and Local Binary Pattern. The average accuracy of both methods is about 97.01% and 85.29% respectively. This proposed multilingual AMSR is hoped to be beneficial to a group of community which requires automatic sorting multi-lingual documents. This research can also be extended to document forensic area or international relations agency to identify unknown native document.
format Article
author Abdel Karim Abu-Ain, Waleed
Siti Norul Huda Sheikh Abdullah,
Khairuddin Omar,
Siti Zaharah Abd. Rahman,
spellingShingle Abdel Karim Abu-Ain, Waleed
Siti Norul Huda Sheikh Abdullah,
Khairuddin Omar,
Siti Zaharah Abd. Rahman,
Automatic multi-lingual script recognition application
author_facet Abdel Karim Abu-Ain, Waleed
Siti Norul Huda Sheikh Abdullah,
Khairuddin Omar,
Siti Zaharah Abd. Rahman,
author_sort Abdel Karim Abu-Ain, Waleed
title Automatic multi-lingual script recognition application
title_short Automatic multi-lingual script recognition application
title_full Automatic multi-lingual script recognition application
title_fullStr Automatic multi-lingual script recognition application
title_full_unstemmed Automatic multi-lingual script recognition application
title_sort automatic multi-lingual script recognition application
publisher Penerbit Universiti Kebangsaan Malaysia
publishDate 2018
url http://journalarticle.ukm.my/17617/1/26556-83139-1-PB.pdf
http://journalarticle.ukm.my/17617/
https://ejournal.ukm.my/gema/issue/view/1098
_version_ 1718927136670089216
score 13.160551