A Statistical Approach for Qur_an Vowel Restoration

19-09-2015 12:25

This paper presents an automatic system that has the ability to restore diacritics (vowels) for non-diacritic Qur’an words, using a unigram base-line model and a bigram Hidden Markov Model (HMM). The proposed system was very robust and reliable without using morphological analysis methods for diacritics restoration. It was found that the HMMs are useful tools for the task of diacritics restoration in Arabic language. The used technique is simple to apply and does not require any language specific knowledge to be embedded in the model. Qur’an was used as corpora; our system was implemented and also tested on many parts of Qur’an as training set. For instance, the proposed system was implemented on 1366 words starting from the beginning of the Qur_an, and the best performance was 94.3% word accuracy for a unigram model and 95.2% word accuracy for a bigram HMM model.