Implementation of improved Levenshtein algorithm for spelling correction word candidate list generation
Candidates’ list generation in spelling correction is a process of finding words from a lexicon that are close to the incorrect word. The most widely used algorithm to generate the candidate list is the Levenshtein algorithm. However, the algorithm consumes high computational cost, especially when t...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
2016
|
Subjects: | |
Online Access: | http://repo.uum.edu.my/20609/1/JTAIT%2088%203%202016%20449%20455.pdf http://repo.uum.edu.my/20609/ http://www.jatit.org/volumes/Vol88No3/10Vol88No3.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Candidates’ list generation in spelling correction is a process of finding words from a lexicon that are close to the incorrect word. The most widely used algorithm to generate the candidate list is the Levenshtein algorithm. However, the algorithm consumes high computational cost, especially when there is a large number of spelling errors. The reason is that calculating Levenshtein algorithm includes operations that create an array and fill the cells of this array by comparing the characters of an incorrect word with the characters of a word from a lexicon. Since most lexicons contain millions of words, such operations will be repeated millions of times for each incorrect word in order to generate its candidates’ list. This study proposes an improved Levenshtein algorithm that reduces the operation steps in comparing characters between the query and lexicon words. Experimental results show that the proposed algorithm outperformed the Levenshtein algorithm in terms of processing time by having 32.43% percentage decrease. |
---|