Reducing the search space and time complexity of needleman-wunsch algorithm (global alignment) and smith waterman algorithm (local alignment) for DNA sequence alignment
The fundamental procedure of analyzing sequence content is sequence comparison. Sequence comparison can be defined as the problem of finding which parts of the sequences are similar and which parts are different, namely comparing two sequences to identify similarities and differences between them....
Saved in:
Main Author: | |
---|---|
Format: | Thesis |
Language: | English |
Published: |
Universiti Malaysia Perlis (UniMAP)
2014
|
Subjects: | |
Online Access: | http://dspace.unimap.edu.my:80/dspace/handle/123456789/31259 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | The fundamental procedure of analyzing sequence content is sequence comparison.
Sequence comparison can be defined as the problem of finding which parts of the
sequences are similar and which parts are different, namely comparing two sequences to identify similarities and differences between them. A typical approach to solve this problem is to find a good and reasonable alignment between the two sequences. The main research in this project is to align the DNA sequences by using the Needleman-Wunsch algorithm for global alignment and Smith-Waterman algorithm for local alignment based on the Dynamic Programming algorithm. The Dynamic Programming Algorithm is guaranteed to find optimal alignment by exploring all possible alignments and choosing the best through the scoring and traceback techniques. The algorithms proposed and evaluated are to reduce
the gaps in aligning sequences as well as the length of the sequences aligned without
compromising the quality or correctness of results. In this project, a study on how to apply
the computational power of Parallel Computing to speed up the lengthy process of
comparing sequences without having to compromise on the optimal results. Parallelization
is only applied to the Needleman-Wunsch algorithm. In order to verify the accuracy and
consistency of measurements obtained in Needleman-Wunsch and Smith-Waterman
algorithms the data is compared with Emboss (global) and Emboss (local) with 600 strands
test data. Results show that the Needle and Smith programs are reduced gaps and mismatch,
but do not affect the accuracy. Results on the parallelization of Needleman-Wunsch
algorithm shows that the parallelization is only efficient for 3000 length of DNA sequences
and above, but does not show any improvement for less than 500 lengths of DNA
sequences although using multiple core platforms. |
---|