Similarities and dissimilarities between character frequencies of written text of Melayu, English and Indonesian languages

This research paper present some statistical similarities and dissimilarities between the character frequencies of three languages, Malay, Indonesian and English. Thew reason for their comparison is that they all share a common symbol set ?(A-Z). It has been found, through investigations that statis...

Full description

Saved in:
Bibliographic Details
Main Authors: Shah, Asadullah, Saidin, Aznan Zuhid, Alshaikhli, Imad Fakhri Taha, Zeki, Akram M., Bhatti, Zeeshan
Format: Conference or Workshop Item
Language:English
Published: 2013
Subjects:
Online Access:http://irep.iium.edu.my/37045/1/similarities-dissimilarities-2014.pdf
http://irep.iium.edu.my/37045/
http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6836574&tag=1
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This research paper present some statistical similarities and dissimilarities between the character frequencies of three languages, Malay, Indonesian and English. Thew reason for their comparison is that they all share a common symbol set ?(A-Z). It has been found, through investigations that statistically Malay and Indonesian languages character frequencies are very close to each other. For example, character "A" "N", and "E" in both Malay and Indonesian languages have frequencies (19%, 20.4%), (Q10%, 9.33%) and (9%, 8.28%), respectively. However the case of English is different, where characters "E", "T" and "A" come with three highest frequencies occurring letters, respectively. An intresting observation is that in spite of some similarities and dissimilarities between the characters, all three languages follow envelop of the frequencies identically rising and falling trend for all characters. Moreover, for all three languages, last four characters, W, x,y,z" , also exhibit lowest usage in their respective languages.