Approaches for preserving content integrity of sensitive online Arabic content: A survey and research challenges

Trends in Internet usage and accessing online content in different languages and formats are proliferating at a considerable speed. There is a vast amount of digital online content available in different formats that are sensitive in nature with respect to writing styles and arrangement of diacritic...

Full description

Saved in:
Bibliographic Details
Main Authors: Hakak, Saqib Iqbal, Kamsin, Amirrudin, Tayan, Omar, Idris, Mohd Yamani Idna, Gilkar, Gulshan Amin
Format: Article
Published: Elsevier 2019
Subjects:
Online Access:http://eprints.um.edu.my/20080/
https://doi.org/10.1016/j.ipm.2017.08.004
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Trends in Internet usage and accessing online content in different languages and formats are proliferating at a considerable speed. There is a vast amount of digital online content available in different formats that are sensitive in nature with respect to writing styles and arrangement of diacritics. However, research done in the area aimed at identifying the necessary techniques suitable for preserving content integrity of sensitive digital online content is limited. So, it is a challenge to determine the techniques most suitable for different formats such as image or binary. Hence, preserving and verifying sensitive content constitutes an emerging problem and calls for timely solutions. The digital Holy Qur'an in Arabic, constitutes, one case of such sensitive content. Due to the different characteristics of the Arabic letters like diacritics (punctuation symbols), kashidas (extended letters) and other symbols, it is very easy to alter the original meaning of the text by simply changing the arrangement of diacritics. This article surveys the different approaches that are presently employed in the process of preserving and verifying the content integrity of sensitive online content. We present the state-of-the-art in content integrity verification and address the existing challenges in preserving the integrity of sensitive texts using the Digital Qur'an as a case study. The proposed taxonomy provides an effective classification and analysis of existing related schemes and their limitations. The paper discusses the recommendations of the expected efficiency of such approaches when applied for use in digital content integrity. Some of the main findings suggest unified approaches of watermarking and string matching approaches can be used to preserve content integrity of any sensitive digital content.