Optimizing B-tree search performance of big data sets / Mohsen Marjani

Many applications continuously produce large amounts of various data every day, which exceeds the limit of conventional data storage tools. Such data typically includes a large amount of data with different formats that becomes very difficult to query using traditional indexing technologies. Indexin...

Full description

Saved in:
Bibliographic Details
Main Author: Mohsen , Marjani
Format: Thesis
Published: 2017
Subjects:
Online Access:http://studentsrepo.um.edu.my/9744/1/Mohsen_Marjani.pdf
http://studentsrepo.um.edu.my/9744/2/Mohsen_Marjani_%E2%80%93_Thesis.pdf
http://studentsrepo.um.edu.my/9744/
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.um.stud.9744
record_format eprints
spelling my.um.stud.97442020-06-21T18:09:38Z Optimizing B-tree search performance of big data sets / Mohsen Marjani Mohsen , Marjani QA75 Electronic computers. Computer science Many applications continuously produce large amounts of various data every day, which exceeds the limit of conventional data storage tools. Such data typically includes a large amount of data with different formats that becomes very difficult to query using traditional indexing technologies. Indexing is used for data retrieval to improve efficiency and accuracy of the results of queries. However, current indexing techniques have low efficiency and poor real-time performance in an actual query when involving big data. Also, current indexing techniques are not supporting all characteristics of big data and they have weaknesses when they have to index a variety of data along with high velocity and volume. B-tree indexing technique is one of the most popular techniques that is used by many database systems including the one that can handle big datasets. Every time search process is running against indexed data using B-tree technique, the process traverses all left child nodes of a node to find lowers values or traverses the right side child nodes for finding bigger values. Repetition of search tasks for later queries with same or overlap conditions causes repeating same algorithmic traverse and consuming same resources including time and computation power in order to retrieve the result of the search process. This study proposes an optimized B-tree search method to improve the execution time of the search tasks and to optimize the performance of the B-tree search process. In this new method, every node has a new element storing a min-max summarization which helps search process checks availability of the value inside the sub-tree of the node, then start traversing it to find the location of the value. However, during every search task, a history value is added to every traversed node to mark the history of last search operation to be used for next search operation. The results of the experimental analysis show that our new proposed search method decreases the execution time of the search tasks and it improves the search performance several times better than B-tree search performance for same query and same dataset. Moreover, the history value improves the performance of the later queries up to 52%. This research contributes in optimizing data retrieval for big data sets and gives direction to researchers towards a novel approach of indexing and searching big data in order to improve query processing and search performance. 2017-06 Thesis NonPeerReviewed application/pdf http://studentsrepo.um.edu.my/9744/1/Mohsen_Marjani.pdf application/pdf http://studentsrepo.um.edu.my/9744/2/Mohsen_Marjani_%E2%80%93_Thesis.pdf Mohsen , Marjani (2017) Optimizing B-tree search performance of big data sets / Mohsen Marjani. PhD thesis, University of Malaya. http://studentsrepo.um.edu.my/9744/
institution Universiti Malaya
building UM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Malaya
content_source UM Student Repository
url_provider http://studentsrepo.um.edu.my/
topic QA75 Electronic computers. Computer science
spellingShingle QA75 Electronic computers. Computer science
Mohsen , Marjani
Optimizing B-tree search performance of big data sets / Mohsen Marjani
description Many applications continuously produce large amounts of various data every day, which exceeds the limit of conventional data storage tools. Such data typically includes a large amount of data with different formats that becomes very difficult to query using traditional indexing technologies. Indexing is used for data retrieval to improve efficiency and accuracy of the results of queries. However, current indexing techniques have low efficiency and poor real-time performance in an actual query when involving big data. Also, current indexing techniques are not supporting all characteristics of big data and they have weaknesses when they have to index a variety of data along with high velocity and volume. B-tree indexing technique is one of the most popular techniques that is used by many database systems including the one that can handle big datasets. Every time search process is running against indexed data using B-tree technique, the process traverses all left child nodes of a node to find lowers values or traverses the right side child nodes for finding bigger values. Repetition of search tasks for later queries with same or overlap conditions causes repeating same algorithmic traverse and consuming same resources including time and computation power in order to retrieve the result of the search process. This study proposes an optimized B-tree search method to improve the execution time of the search tasks and to optimize the performance of the B-tree search process. In this new method, every node has a new element storing a min-max summarization which helps search process checks availability of the value inside the sub-tree of the node, then start traversing it to find the location of the value. However, during every search task, a history value is added to every traversed node to mark the history of last search operation to be used for next search operation. The results of the experimental analysis show that our new proposed search method decreases the execution time of the search tasks and it improves the search performance several times better than B-tree search performance for same query and same dataset. Moreover, the history value improves the performance of the later queries up to 52%. This research contributes in optimizing data retrieval for big data sets and gives direction to researchers towards a novel approach of indexing and searching big data in order to improve query processing and search performance.
format Thesis
author Mohsen , Marjani
author_facet Mohsen , Marjani
author_sort Mohsen , Marjani
title Optimizing B-tree search performance of big data sets / Mohsen Marjani
title_short Optimizing B-tree search performance of big data sets / Mohsen Marjani
title_full Optimizing B-tree search performance of big data sets / Mohsen Marjani
title_fullStr Optimizing B-tree search performance of big data sets / Mohsen Marjani
title_full_unstemmed Optimizing B-tree search performance of big data sets / Mohsen Marjani
title_sort optimizing b-tree search performance of big data sets / mohsen marjani
publishDate 2017
url http://studentsrepo.um.edu.my/9744/1/Mohsen_Marjani.pdf
http://studentsrepo.um.edu.my/9744/2/Mohsen_Marjani_%E2%80%93_Thesis.pdf
http://studentsrepo.um.edu.my/9744/
_version_ 1738506295420387328
score 13.18916