Development of computational tools for African oil palm genome and gene expression analyses / Joel Low Zi-Bin

Continuing improvements in the yield of oil palm requires knowledge of the genes and mechanisms that regulate oil accumulation. This is believed attainable with the sequencing of the oil palm genome. However, Sime Darby’s oil palm genome assembly is far from complete. Here I look at ways to improve...

Full description

Saved in:
Bibliographic Details
Main Author: Joel Low , Zi-Bin
Format: Thesis
Published: 2019
Subjects:
Online Access:http://studentsrepo.um.edu.my/12233/2/Joel_Low.pdf
http://studentsrepo.um.edu.my/12233/1/Joel_Low.pdf
http://studentsrepo.um.edu.my/12233/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Continuing improvements in the yield of oil palm requires knowledge of the genes and mechanisms that regulate oil accumulation. This is believed attainable with the sequencing of the oil palm genome. However, Sime Darby’s oil palm genome assembly is far from complete. Here I look at ways to improve the genome assembly quality by means of computational tools developed to build exome contigs (GenSeed Pipeline Suite), detection of potential regions of misassembly due to repeats (BridgeReader), and the use of molecular markers as a means to arrange scaffolds into a physical map (MarkMyMap). I show that with the use of these tools, the most recent assembly version, OPg3, had improved over the first version in capturing the gene space (39% more mappable transcripts) and molecular marker representation (3% increase in mappable SSRs and DArTs). Furthermore, the constructed physical map representing the oil palm’s 16 chromosomes had improved genome coverage from Sime Darby’s previous version by 79%. The improvements to the genome draft in this work will assist future Genome-Wide Association Studies and functional studies of genes. Besides that, I have developed a fast Bayesian method to overcome analytical bottlenecks in RNA-Seq experiments with limited number of replicates and low sequencing coverage, such as those found for oil palm studies. I incorporated a previously unused sequencing coverage parameter determined from the concentration of an RNA sample into a procedure to make differentially expressed gene calls. This method had better or comparable performance with NOISeq and GFOLD, according to the results from simulations and experiments with real unreplicated data. The method is called CORNAS (Coverage-dependent RNA-Seq), and I show that robust differentially expressed gene calls can be made in an RNA-Seq study of oil palm inflorescences using CORNAS.