An improved Levenshtein algorithm for spelling correction word candidate list generation

Candidates’ list generation in spelling correction is a process of finding words from a lexicon that should be close to the incorrect word. The most widely used algorithm for generating candidates’ list for incorrect words is based on Levenshtein distance. However, this algorithm takes too much time...

Full description

Saved in:
Bibliographic Details
Main Author: Abdulkhudhur, Hanan Najm
Format: Thesis
Language:English
English
Published: 2016
Subjects:
Online Access:http://etd.uum.edu.my/6564/
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.uum.etd.6564
record_format eprints
spelling my.uum.etd.65642021-04-05T01:34:49Z http://etd.uum.edu.my/6564/ An improved Levenshtein algorithm for spelling correction word candidate list generation Abdulkhudhur, Hanan Najm QA273-280 Probabilities. Mathematical statistics Candidates’ list generation in spelling correction is a process of finding words from a lexicon that should be close to the incorrect word. The most widely used algorithm for generating candidates’ list for incorrect words is based on Levenshtein distance. However, this algorithm takes too much time when there is a large number of spelling errors. The reason is that calculating Levenshtein algorithm includes operations that create an array and fill the cells of this array by comparing the characters of an incorrect word with the characters of a word from a lexicon. Since most lexicons contain millions of words, then these operations will be repeated millions of times for each incorrect word to generate its candidates list. This dissertation improved Levenshtein algorithm by designing an operational technique that has been included in this algorithm. The proposed operational technique enhances Levenshtein algorithm in terms of the processing time of its executing without affecting its accuracy. It reduces the operations required to measure cells’ values in the first row, first column, second row, second column, third row, and third column in Levenshtein array. The improved Levenshtein algorithm was evaluated against the original algorithm. Experimental results show that the proposed algorithm outperforms Levenshtein algorithm in terms of the processing time by 36.45% while the accuracy of both algorithms is still the same. 2016 Thesis NonPeerReviewed text en /6564/1/s814922_01.pdf text en /6564/2/s814922_02.pdf Abdulkhudhur, Hanan Najm (2016) An improved Levenshtein algorithm for spelling correction word candidate list generation. Masters thesis, Universiti Utara Malaysia.
institution Universiti Utara Malaysia
building UUM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Utara Malaysia
content_source UUM Electronic Theses
url_provider http://etd.uum.edu.my/
language English
English
topic QA273-280 Probabilities. Mathematical statistics
spellingShingle QA273-280 Probabilities. Mathematical statistics
Abdulkhudhur, Hanan Najm
An improved Levenshtein algorithm for spelling correction word candidate list generation
description Candidates’ list generation in spelling correction is a process of finding words from a lexicon that should be close to the incorrect word. The most widely used algorithm for generating candidates’ list for incorrect words is based on Levenshtein distance. However, this algorithm takes too much time when there is a large number of spelling errors. The reason is that calculating Levenshtein algorithm includes operations that create an array and fill the cells of this array by comparing the characters of an incorrect word with the characters of a word from a lexicon. Since most lexicons contain millions of words, then these operations will be repeated millions of times for each incorrect word to generate its candidates list. This dissertation improved Levenshtein algorithm by designing an operational technique that has been included in this algorithm. The proposed operational technique enhances Levenshtein algorithm in terms of the processing time of its executing without affecting its accuracy. It reduces the operations required to measure cells’ values in the first row, first column, second row, second column, third row, and third column in Levenshtein array. The improved Levenshtein algorithm was evaluated against the original algorithm. Experimental results show that the proposed algorithm outperforms Levenshtein algorithm in terms of the processing time by 36.45% while the accuracy of both algorithms is still the same.
format Thesis
author Abdulkhudhur, Hanan Najm
author_facet Abdulkhudhur, Hanan Najm
author_sort Abdulkhudhur, Hanan Najm
title An improved Levenshtein algorithm for spelling correction word candidate list generation
title_short An improved Levenshtein algorithm for spelling correction word candidate list generation
title_full An improved Levenshtein algorithm for spelling correction word candidate list generation
title_fullStr An improved Levenshtein algorithm for spelling correction word candidate list generation
title_full_unstemmed An improved Levenshtein algorithm for spelling correction word candidate list generation
title_sort improved levenshtein algorithm for spelling correction word candidate list generation
publishDate 2016
url http://etd.uum.edu.my/6564/
_version_ 1696978324263272448
score 13.18916