The profiling of short tandem repeats (STRS) fromnext-generation sequencing (NGS) data by establishing a whole genome STRS pipeline for forensic bioinformatics

Repeat region with length of one to six base pairs (bp), found in DNA sequences is known as short tandem repeats (STRs). Currently, high-throughput next-generation sequencers have facilitated the effective polymorphic STR markers’ identification. In this study, a new whole genome STRs pipeline has...

Full description

Saved in:
Bibliographic Details
Main Authors: Nur Nabilah A.,, Venkataramanan S.,
Format: Article
Language:English
Published: Malaysian Society of Applied Biology 2016
Online Access:http://journalarticle.ukm.my/11816/1/45_02_12.pdf
http://journalarticle.ukm.my/11816/
http://www.mabjournal.com/index.php?option=com_content&view=article&id=565&catid=59:current-view&Itemid=56
Tags: Add Tag
No Tags, Be the first to tag this record!
id my-ukm.journal.11816
record_format eprints
spelling my-ukm.journal.118162018-07-02T01:26:58Z http://journalarticle.ukm.my/11816/ The profiling of short tandem repeats (STRS) fromnext-generation sequencing (NGS) data by establishing a whole genome STRS pipeline for forensic bioinformatics Nur Nabilah A., Venkataramanan S., Repeat region with length of one to six base pairs (bp), found in DNA sequences is known as short tandem repeats (STRs). Currently, high-throughput next-generation sequencers have facilitated the effective polymorphic STR markers’ identification. In this study, a new whole genome STRs pipeline has been established in order to call and profile STRs from next-generation sequencing (NGS) data. Firstly, genome sequences of Helicobater pylori strain, CPY1124 and PeCan4 as reference genome were retrieved from European Nucleotide Archive (ENA) database which then the quality of sequences were checked using FastQC. The assembly of genome sequences was done by VELVET de novo assembler. Unordered contigs from VELVET’s output was realigned using multiple genome alignment (MAUVE) to obtain ordered contigs sequence. Lastly, STRs calling and profiling by Tandem Repeat Finder was done with the parameters of (2: match, 7: mismatch and 7: indels). These parameters are for Smith-Waterman style local alignment using wrap-around dynamic programming. As a result, this new pipeline enables to identify polymorphic and unique STRs which are GTTTG and AAACCC from CPY1124. This pipeline has been compared with other available STRs profiling pipeline like pSTR Finder and Tandem Repeat Database (TRDB) for validation purpose. The similar output producing by both tools thus indicates the reliability of this new pipeline for future usage. Malaysian Society of Applied Biology 2016-12 Article PeerReviewed application/pdf en http://journalarticle.ukm.my/11816/1/45_02_12.pdf Nur Nabilah A., and Venkataramanan S., (2016) The profiling of short tandem repeats (STRS) fromnext-generation sequencing (NGS) data by establishing a whole genome STRS pipeline for forensic bioinformatics. Malaysian Applied Biology, 45 (2). pp. 75-85. ISSN 0126-8643 http://www.mabjournal.com/index.php?option=com_content&view=article&id=565&catid=59:current-view&Itemid=56
institution Universiti Kebangsaan Malaysia
building Perpustakaan Tun Sri Lanang Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Kebangsaan Malaysia
content_source UKM Journal Article Repository
url_provider http://journalarticle.ukm.my/
language English
description Repeat region with length of one to six base pairs (bp), found in DNA sequences is known as short tandem repeats (STRs). Currently, high-throughput next-generation sequencers have facilitated the effective polymorphic STR markers’ identification. In this study, a new whole genome STRs pipeline has been established in order to call and profile STRs from next-generation sequencing (NGS) data. Firstly, genome sequences of Helicobater pylori strain, CPY1124 and PeCan4 as reference genome were retrieved from European Nucleotide Archive (ENA) database which then the quality of sequences were checked using FastQC. The assembly of genome sequences was done by VELVET de novo assembler. Unordered contigs from VELVET’s output was realigned using multiple genome alignment (MAUVE) to obtain ordered contigs sequence. Lastly, STRs calling and profiling by Tandem Repeat Finder was done with the parameters of (2: match, 7: mismatch and 7: indels). These parameters are for Smith-Waterman style local alignment using wrap-around dynamic programming. As a result, this new pipeline enables to identify polymorphic and unique STRs which are GTTTG and AAACCC from CPY1124. This pipeline has been compared with other available STRs profiling pipeline like pSTR Finder and Tandem Repeat Database (TRDB) for validation purpose. The similar output producing by both tools thus indicates the reliability of this new pipeline for future usage.
format Article
author Nur Nabilah A.,
Venkataramanan S.,
spellingShingle Nur Nabilah A.,
Venkataramanan S.,
The profiling of short tandem repeats (STRS) fromnext-generation sequencing (NGS) data by establishing a whole genome STRS pipeline for forensic bioinformatics
author_facet Nur Nabilah A.,
Venkataramanan S.,
author_sort Nur Nabilah A.,
title The profiling of short tandem repeats (STRS) fromnext-generation sequencing (NGS) data by establishing a whole genome STRS pipeline for forensic bioinformatics
title_short The profiling of short tandem repeats (STRS) fromnext-generation sequencing (NGS) data by establishing a whole genome STRS pipeline for forensic bioinformatics
title_full The profiling of short tandem repeats (STRS) fromnext-generation sequencing (NGS) data by establishing a whole genome STRS pipeline for forensic bioinformatics
title_fullStr The profiling of short tandem repeats (STRS) fromnext-generation sequencing (NGS) data by establishing a whole genome STRS pipeline for forensic bioinformatics
title_full_unstemmed The profiling of short tandem repeats (STRS) fromnext-generation sequencing (NGS) data by establishing a whole genome STRS pipeline for forensic bioinformatics
title_sort profiling of short tandem repeats (strs) fromnext-generation sequencing (ngs) data by establishing a whole genome strs pipeline for forensic bioinformatics
publisher Malaysian Society of Applied Biology
publishDate 2016
url http://journalarticle.ukm.my/11816/1/45_02_12.pdf
http://journalarticle.ukm.my/11816/
http://www.mabjournal.com/index.php?option=com_content&view=article&id=565&catid=59:current-view&Itemid=56
_version_ 1643738611281035264
score 13.18916