Frequencies determination of characters for Bahasa Melayu: results of preliminary investigation

Bahasa Melayu (Malay language) is a language spoken in Malaysia and many countries around it. It has rich literature and deep roots in culture. Bahasa Melayu language uses roman character set (i.e.A-Z) identical to English language. The written language uses the character set as building blocks to...

Full description

Saved in:
Bibliographic Details
Main Authors: Shah, Asadullah, Saidin, Aznan Zuhid, Taha Alshaikhli, Imad Fakhri, Zeki, Akram M.
Format: Conference or Workshop Item
Language:English
Published: 2011
Subjects:
Online Access:http://irep.iium.edu.my/2933/1/Poster-asadullah_aznan.ppt
http://irep.iium.edu.my/2933/
http://kict.iium.edu.my/pacling/index.html
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.iium.irep.2933
record_format dspace
spelling my.iium.irep.29332020-12-07T07:45:29Z http://irep.iium.edu.my/2933/ Frequencies determination of characters for Bahasa Melayu: results of preliminary investigation Shah, Asadullah Saidin, Aznan Zuhid Taha Alshaikhli, Imad Fakhri Zeki, Akram M. PL Languages and literatures of Eastern Asia, Africa, Oceania PL5101 Malay Bahasa Melayu (Malay language) is a language spoken in Malaysia and many countries around it. It has rich literature and deep roots in culture. Bahasa Melayu language uses roman character set (i.e.A-Z) identical to English language. The written language uses the character set as building blocks to build word, sentences and phrases along with special punctuations and signs to create documents of interest. In this paper, results of preliminary investigation of Malay text documents are provided. For this purpose scanning of articles written upon various topics in Malay were carried out. Approximately 31 thousand characters from different articles are scanned. Preliminary observations indicate that on average, character “A” occurs 19%, character “N” occur 10%, character “E” occur “9%”and character “I” 8% in text. However, it is also observed from the data that, these are the characters from over all set with highest frequencies of occurances and it is expected that during further investigation they will remain as higher frequency occurring characters. Furthermore, the results indicate that for Bahasa Melayu characters appearance in text is very close in character frequencies of Bahasa Indonesia, but having different appearance of characters than English language. The investigation also indicate that these two languages, Bahasa Melayu and Bahasa Indonesia share close phonetic structure but not English, though all three use same character set 2011-07 Conference or Workshop Item PeerReviewed application/pdf en http://irep.iium.edu.my/2933/1/Poster-asadullah_aznan.ppt Shah, Asadullah and Saidin, Aznan Zuhid and Taha Alshaikhli, Imad Fakhri and Zeki, Akram M. (2011) Frequencies determination of characters for Bahasa Melayu: results of preliminary investigation. In: 12th Conference of the Pacific Association for Computational Linguistics (PACLING 2011), 19 - 21 July 2011, IIUM. (Unpublished) http://kict.iium.edu.my/pacling/index.html
institution Universiti Islam Antarabangsa Malaysia
building IIUM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider International Islamic University Malaysia
content_source IIUM Repository (IREP)
url_provider http://irep.iium.edu.my/
language English
topic PL Languages and literatures of Eastern Asia, Africa, Oceania
PL5101 Malay
spellingShingle PL Languages and literatures of Eastern Asia, Africa, Oceania
PL5101 Malay
Shah, Asadullah
Saidin, Aznan Zuhid
Taha Alshaikhli, Imad Fakhri
Zeki, Akram M.
Frequencies determination of characters for Bahasa Melayu: results of preliminary investigation
description Bahasa Melayu (Malay language) is a language spoken in Malaysia and many countries around it. It has rich literature and deep roots in culture. Bahasa Melayu language uses roman character set (i.e.A-Z) identical to English language. The written language uses the character set as building blocks to build word, sentences and phrases along with special punctuations and signs to create documents of interest. In this paper, results of preliminary investigation of Malay text documents are provided. For this purpose scanning of articles written upon various topics in Malay were carried out. Approximately 31 thousand characters from different articles are scanned. Preliminary observations indicate that on average, character “A” occurs 19%, character “N” occur 10%, character “E” occur “9%”and character “I” 8% in text. However, it is also observed from the data that, these are the characters from over all set with highest frequencies of occurances and it is expected that during further investigation they will remain as higher frequency occurring characters. Furthermore, the results indicate that for Bahasa Melayu characters appearance in text is very close in character frequencies of Bahasa Indonesia, but having different appearance of characters than English language. The investigation also indicate that these two languages, Bahasa Melayu and Bahasa Indonesia share close phonetic structure but not English, though all three use same character set
format Conference or Workshop Item
author Shah, Asadullah
Saidin, Aznan Zuhid
Taha Alshaikhli, Imad Fakhri
Zeki, Akram M.
author_facet Shah, Asadullah
Saidin, Aznan Zuhid
Taha Alshaikhli, Imad Fakhri
Zeki, Akram M.
author_sort Shah, Asadullah
title Frequencies determination of characters for Bahasa Melayu: results of preliminary investigation
title_short Frequencies determination of characters for Bahasa Melayu: results of preliminary investigation
title_full Frequencies determination of characters for Bahasa Melayu: results of preliminary investigation
title_fullStr Frequencies determination of characters for Bahasa Melayu: results of preliminary investigation
title_full_unstemmed Frequencies determination of characters for Bahasa Melayu: results of preliminary investigation
title_sort frequencies determination of characters for bahasa melayu: results of preliminary investigation
publishDate 2011
url http://irep.iium.edu.my/2933/1/Poster-asadullah_aznan.ppt
http://irep.iium.edu.my/2933/
http://kict.iium.edu.my/pacling/index.html
_version_ 1685578515598016512
score 13.18916