Performance analysis of bacterial genome assemblers using illumina next generation sequencing data / Nur ‘ Ain Mohd Ishak

The advancement of next generation sequencing (NGS) technology has revolutionized the field of genomic and genetic studies. As compared to conventional methods, NGS generate comprehensive genomic data at a fraction of the cost with a higher percentage of accuracy. One of the processing and analyzing...

Full description

Saved in:
Bibliographic Details
Main Author: Nur ‘ Ain , Mohd Ishak
Format: Thesis
Published: 2020
Subjects:
Online Access:http://studentsrepo.um.edu.my/12724/1/Nur_'ain.pdf
http://studentsrepo.um.edu.my/12724/2/Nur_%E2%80%98ain.pdf
http://studentsrepo.um.edu.my/12724/
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.um.stud.12724
record_format eprints
spelling my.um.stud.127242021-12-14T19:07:31Z Performance analysis of bacterial genome assemblers using illumina next generation sequencing data / Nur ‘ Ain Mohd Ishak Nur ‘ Ain , Mohd Ishak Q Science (General) QH301 Biology The advancement of next generation sequencing (NGS) technology has revolutionized the field of genomic and genetic studies. As compared to conventional methods, NGS generate comprehensive genomic data at a fraction of the cost with a higher percentage of accuracy. One of the processing and analyzing NGS data is genome assembly. De novo assembly is a process of assembling short reads into contiguous sections of sequence without a reference which is different with conventional mapping technique. De Bruijn graph is one of the assembly algorithms that are widely used for short reads sequences produced from NGS platforms. In this study, the performance of four de novo assemblers (SPAdes, ABySS, Velvet and MaSuRCA) is reported, in which variants of de Brujin graph algorithms are applied, using genomic data generated by the Illumina sequencing platform. The computational performance regarding the assemblers running time were compared. The assembled contigs and scaffolds were also evaluated based on several qualities specifically for their length and the contiguity of the assembly using ABySS-fac. Results showed that on single-end data sets, MaSuRCA, and SPAdes produced generally the best results among all the four assemblers with highest percentage of contigs that were equal or longer than 500 bp, highest total base pairs, highest N50 and the lowest L50 for most assemblers. For paired-end data sets, Velvet are suitable to assemble all the seven bacteria genome sequences. This comparative study will advance the current knowledge of de novo genome assembly as it is the first step toward characterizing and revealing whole genomic information. In addition, this work provides a practical guideline that could aid researchers in identifying the appropriate assembler(s) for their research projects. 2020-10 Thesis NonPeerReviewed application/pdf http://studentsrepo.um.edu.my/12724/1/Nur_'ain.pdf application/pdf http://studentsrepo.um.edu.my/12724/2/Nur_%E2%80%98ain.pdf Nur ‘ Ain , Mohd Ishak (2020) Performance analysis of bacterial genome assemblers using illumina next generation sequencing data / Nur ‘ Ain Mohd Ishak. Masters thesis, Universiti Malaya. http://studentsrepo.um.edu.my/12724/
institution Universiti Malaya
building UM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Malaya
content_source UM Student Repository
url_provider http://studentsrepo.um.edu.my/
topic Q Science (General)
QH301 Biology
spellingShingle Q Science (General)
QH301 Biology
Nur ‘ Ain , Mohd Ishak
Performance analysis of bacterial genome assemblers using illumina next generation sequencing data / Nur ‘ Ain Mohd Ishak
description The advancement of next generation sequencing (NGS) technology has revolutionized the field of genomic and genetic studies. As compared to conventional methods, NGS generate comprehensive genomic data at a fraction of the cost with a higher percentage of accuracy. One of the processing and analyzing NGS data is genome assembly. De novo assembly is a process of assembling short reads into contiguous sections of sequence without a reference which is different with conventional mapping technique. De Bruijn graph is one of the assembly algorithms that are widely used for short reads sequences produced from NGS platforms. In this study, the performance of four de novo assemblers (SPAdes, ABySS, Velvet and MaSuRCA) is reported, in which variants of de Brujin graph algorithms are applied, using genomic data generated by the Illumina sequencing platform. The computational performance regarding the assemblers running time were compared. The assembled contigs and scaffolds were also evaluated based on several qualities specifically for their length and the contiguity of the assembly using ABySS-fac. Results showed that on single-end data sets, MaSuRCA, and SPAdes produced generally the best results among all the four assemblers with highest percentage of contigs that were equal or longer than 500 bp, highest total base pairs, highest N50 and the lowest L50 for most assemblers. For paired-end data sets, Velvet are suitable to assemble all the seven bacteria genome sequences. This comparative study will advance the current knowledge of de novo genome assembly as it is the first step toward characterizing and revealing whole genomic information. In addition, this work provides a practical guideline that could aid researchers in identifying the appropriate assembler(s) for their research projects.
format Thesis
author Nur ‘ Ain , Mohd Ishak
author_facet Nur ‘ Ain , Mohd Ishak
author_sort Nur ‘ Ain , Mohd Ishak
title Performance analysis of bacterial genome assemblers using illumina next generation sequencing data / Nur ‘ Ain Mohd Ishak
title_short Performance analysis of bacterial genome assemblers using illumina next generation sequencing data / Nur ‘ Ain Mohd Ishak
title_full Performance analysis of bacterial genome assemblers using illumina next generation sequencing data / Nur ‘ Ain Mohd Ishak
title_fullStr Performance analysis of bacterial genome assemblers using illumina next generation sequencing data / Nur ‘ Ain Mohd Ishak
title_full_unstemmed Performance analysis of bacterial genome assemblers using illumina next generation sequencing data / Nur ‘ Ain Mohd Ishak
title_sort performance analysis of bacterial genome assemblers using illumina next generation sequencing data / nur ‘ ain mohd ishak
publishDate 2020
url http://studentsrepo.um.edu.my/12724/1/Nur_'ain.pdf
http://studentsrepo.um.edu.my/12724/2/Nur_%E2%80%98ain.pdf
http://studentsrepo.um.edu.my/12724/
_version_ 1738506641447321600
score 13.160551