Big data: Performance profiling of Meteorological and Oceanographic data on Hive
The emergence and development of big data tools, techniques and systems motivate industries and organizations to embrace and explore research in big data. This is to circumvent the challenges of the traditional database systems. However, the available benchmarks and workloads are for some specific a...
Saved in:
Main Authors: | , , |
---|---|
Format: | Conference or Workshop Item |
Published: |
Institute of Electrical and Electronics Engineers Inc.
2016
|
Online Access: | https://www.scopus.com/inward/record.uri?eid=2-s2.0-85010433068&doi=10.1109%2fICCOINS.2016.7783215&partnerID=40&md5=c44afcfa573af47fe130803c36767564 http://eprints.utp.edu.my/30482/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
id |
my.utp.eprints.30482 |
---|---|
record_format |
eprints |
spelling |
my.utp.eprints.304822022-03-25T06:55:41Z Big data: Performance profiling of Meteorological and Oceanographic data on Hive Abdullahi, A.U. Ahmad, R. Zakaria, N.M. The emergence and development of big data tools, techniques and systems motivate industries and organizations to embrace and explore research in big data. This is to circumvent the challenges of the traditional database systems. However, the available benchmarks and workloads are for some specific aspect of Information Technology industries, which has differences in nature and complexities with the data obtained from other sources. Hence there is need for using data from other domains in order to evaluate the performance and maturity of the big data technologies. In this paper the performance profiling of Meteorological and Oceanographic data on Hive is conducted. Hive being the commonly used data warehouse analytical platform for big data is chosen with the view to exposing the intricacies that are involved in the formating and loading of the data. The response time for indexed and non-indexed retrievals using three set of queries frequently used in the area is found. The query types are Type 1 SELECT with WHERE clause, Type 2 SELECT with JOIN clause. And Type 3 SELECT with GROUP BY clause. The experimental results show that a good response time for both indexed and Non-indexed tables are achieved. The indexed retrieval shows a significant decrease in the response time for Type 1 query for all data sizes and for Type 3 query for 100GB data size and less. It also shows additional overhead for Type 2 query for all data sizes and Type 3 query for 500GB and more data sizes. The Meteorological and Oceanographic data if properly formated it's analytics with Hive proved to be efficient compared to the traditional database systems. The results of this study has the potentials of attracting the oil and gas companies to adopt big data technologies for the handling of their exploration dataset. © 2016 IEEE. Institute of Electrical and Electronics Engineers Inc. 2016 Conference or Workshop Item NonPeerReviewed https://www.scopus.com/inward/record.uri?eid=2-s2.0-85010433068&doi=10.1109%2fICCOINS.2016.7783215&partnerID=40&md5=c44afcfa573af47fe130803c36767564 Abdullahi, A.U. and Ahmad, R. and Zakaria, N.M. (2016) Big data: Performance profiling of Meteorological and Oceanographic data on Hive. In: UNSPECIFIED. http://eprints.utp.edu.my/30482/ |
institution |
Universiti Teknologi Petronas |
building |
UTP Resource Centre |
collection |
Institutional Repository |
continent |
Asia |
country |
Malaysia |
content_provider |
Universiti Teknologi Petronas |
content_source |
UTP Institutional Repository |
url_provider |
http://eprints.utp.edu.my/ |
description |
The emergence and development of big data tools, techniques and systems motivate industries and organizations to embrace and explore research in big data. This is to circumvent the challenges of the traditional database systems. However, the available benchmarks and workloads are for some specific aspect of Information Technology industries, which has differences in nature and complexities with the data obtained from other sources. Hence there is need for using data from other domains in order to evaluate the performance and maturity of the big data technologies. In this paper the performance profiling of Meteorological and Oceanographic data on Hive is conducted. Hive being the commonly used data warehouse analytical platform for big data is chosen with the view to exposing the intricacies that are involved in the formating and loading of the data. The response time for indexed and non-indexed retrievals using three set of queries frequently used in the area is found. The query types are Type 1 SELECT with WHERE clause, Type 2 SELECT with JOIN clause. And Type 3 SELECT with GROUP BY clause. The experimental results show that a good response time for both indexed and Non-indexed tables are achieved. The indexed retrieval shows a significant decrease in the response time for Type 1 query for all data sizes and for Type 3 query for 100GB data size and less. It also shows additional overhead for Type 2 query for all data sizes and Type 3 query for 500GB and more data sizes. The Meteorological and Oceanographic data if properly formated it's analytics with Hive proved to be efficient compared to the traditional database systems. The results of this study has the potentials of attracting the oil and gas companies to adopt big data technologies for the handling of their exploration dataset. © 2016 IEEE. |
format |
Conference or Workshop Item |
author |
Abdullahi, A.U. Ahmad, R. Zakaria, N.M. |
spellingShingle |
Abdullahi, A.U. Ahmad, R. Zakaria, N.M. Big data: Performance profiling of Meteorological and Oceanographic data on Hive |
author_facet |
Abdullahi, A.U. Ahmad, R. Zakaria, N.M. |
author_sort |
Abdullahi, A.U. |
title |
Big data: Performance profiling of Meteorological and Oceanographic data on Hive |
title_short |
Big data: Performance profiling of Meteorological and Oceanographic data on Hive |
title_full |
Big data: Performance profiling of Meteorological and Oceanographic data on Hive |
title_fullStr |
Big data: Performance profiling of Meteorological and Oceanographic data on Hive |
title_full_unstemmed |
Big data: Performance profiling of Meteorological and Oceanographic data on Hive |
title_sort |
big data: performance profiling of meteorological and oceanographic data on hive |
publisher |
Institute of Electrical and Electronics Engineers Inc. |
publishDate |
2016 |
url |
https://www.scopus.com/inward/record.uri?eid=2-s2.0-85010433068&doi=10.1109%2fICCOINS.2016.7783215&partnerID=40&md5=c44afcfa573af47fe130803c36767564 http://eprints.utp.edu.my/30482/ |
_version_ |
1738657113833472000 |
score |
13.211869 |