Staff View: Low latency fast data computation scheme for map reduce based clusters

Low latency fast data computation scheme for map reduce based clusters

MapReduce based clusters is an emerging paradigm for big data analytics to scale up and speed up the big data classification, investigation, and processing of the huge volumes, massive and complex data sets. One of the fundamental issues of processing the data in MapReduce clusters is to deal with r...

Full description

Saved in:

Bibliographic Details
Main Author:	Shabbir, Aisha
Format:	Thesis
Language:	English
Published:	2020
Subjects:	QA75 Electronic computers. Computer science
Online Access:	http://eprints.utm.my/id/eprint/98237/1/AishaShabbirPSC2020.pdf http://eprints.utm.my/id/eprint/98237/ http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:143970
Tags:	Add Tag No Tags, Be the first to tag this record!

id	my.utm.98237
record_format	eprints
spelling	my.utm.982372022-11-23T08:06:47Z http://eprints.utm.my/id/eprint/98237/ Low latency fast data computation scheme for map reduce based clusters Shabbir, Aisha QA75 Electronic computers. Computer science MapReduce based clusters is an emerging paradigm for big data analytics to scale up and speed up the big data classification, investigation, and processing of the huge volumes, massive and complex data sets. One of the fundamental issues of processing the data in MapReduce clusters is to deal with resource heterogeneity, especially when there is data inter-dependency among the tasks. Secondly, MapReduce runs a job in many phases; the intermediate data traffic and its migration time become a major bottleneck for the computation of jobs which produces a huge intermediate data in the shuffle phase. Further, encountering factors to monitor the critical issue of straggling is necessary because it produces unnecessary delays and poses a serious constraint on the overall performance of the system. Thus, this research aims to provide a low latency fast data computation scheme which introduces three algorithms to handle interdependent task computation among heterogeneous resources, reducing intermediate data traffic with its migration time and monitoring and modelling job straggling factors. This research has developed a Low Latency and Computational Cost based Tasks Scheduling (LLCC-TS) algorithm of interdependent tasks on heterogeneous resources by encountering priority to provide cost-effective resource utilization and reduced makespan. Furthermore, an Aggregation and Partition based Accelerated Intermediate Data Migration (APAIDM) algorithm has been presented to reduce the intermediate data traffic and data migration time in the shuffle phase by using aggregators and custom partitioner. Moreover, MapReduce Total Execution Time Prediction (MTETP) scheme for MapReduce job computation with inclusion of the factors which affect the job computation time has been produced using machine learning technique (linear regression) in order to monitor the job straggling and minimize the latency. LLCCTS algorithm has 66.13%, 22.23%, 43.53%, and 44.74% performance improvement rate over FIFO, improved max-min, SJF and MOS algorithms respectively for makespan time of scheduling of interdependent tasks. The AP-AIDM algorithm scored 66.62% and 48.4% performance improvements in reducing the data migration time over hash basic and conventional aggregation algorithms, respectively. Moreover, an MTETP technique shows the performance improvement in predicting the total job execution time with 20.42% accuracy than the improved HP technique. Thus, the combination of the three algorithms mentioned above provides a low latency fast data computation scheme for MapReduce based clusters. 2020 Thesis NonPeerReviewed application/pdf en http://eprints.utm.my/id/eprint/98237/1/AishaShabbirPSC2020.pdf Shabbir, Aisha (2020) Low latency fast data computation scheme for map reduce based clusters. PhD thesis, Universiti Teknologi Malaysia, Faculty of Engineering - School of Computing. http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:143970
institution	Universiti Teknologi Malaysia
building	UTM Library
collection	Institutional Repository
continent	Asia
country	Malaysia
content_provider	Universiti Teknologi Malaysia
content_source	UTM Institutional Repository
url_provider	http://eprints.utm.my/
language	English
topic	QA75 Electronic computers. Computer science
spellingShingle	QA75 Electronic computers. Computer science Shabbir, Aisha Low latency fast data computation scheme for map reduce based clusters
description	MapReduce based clusters is an emerging paradigm for big data analytics to scale up and speed up the big data classification, investigation, and processing of the huge volumes, massive and complex data sets. One of the fundamental issues of processing the data in MapReduce clusters is to deal with resource heterogeneity, especially when there is data inter-dependency among the tasks. Secondly, MapReduce runs a job in many phases; the intermediate data traffic and its migration time become a major bottleneck for the computation of jobs which produces a huge intermediate data in the shuffle phase. Further, encountering factors to monitor the critical issue of straggling is necessary because it produces unnecessary delays and poses a serious constraint on the overall performance of the system. Thus, this research aims to provide a low latency fast data computation scheme which introduces three algorithms to handle interdependent task computation among heterogeneous resources, reducing intermediate data traffic with its migration time and monitoring and modelling job straggling factors. This research has developed a Low Latency and Computational Cost based Tasks Scheduling (LLCC-TS) algorithm of interdependent tasks on heterogeneous resources by encountering priority to provide cost-effective resource utilization and reduced makespan. Furthermore, an Aggregation and Partition based Accelerated Intermediate Data Migration (APAIDM) algorithm has been presented to reduce the intermediate data traffic and data migration time in the shuffle phase by using aggregators and custom partitioner. Moreover, MapReduce Total Execution Time Prediction (MTETP) scheme for MapReduce job computation with inclusion of the factors which affect the job computation time has been produced using machine learning technique (linear regression) in order to monitor the job straggling and minimize the latency. LLCCTS algorithm has 66.13%, 22.23%, 43.53%, and 44.74% performance improvement rate over FIFO, improved max-min, SJF and MOS algorithms respectively for makespan time of scheduling of interdependent tasks. The AP-AIDM algorithm scored 66.62% and 48.4% performance improvements in reducing the data migration time over hash basic and conventional aggregation algorithms, respectively. Moreover, an MTETP technique shows the performance improvement in predicting the total job execution time with 20.42% accuracy than the improved HP technique. Thus, the combination of the three algorithms mentioned above provides a low latency fast data computation scheme for MapReduce based clusters.
format	Thesis
author	Shabbir, Aisha
author_facet	Shabbir, Aisha
author_sort	Shabbir, Aisha
title	Low latency fast data computation scheme for map reduce based clusters
title_short	Low latency fast data computation scheme for map reduce based clusters
title_full	Low latency fast data computation scheme for map reduce based clusters
title_fullStr	Low latency fast data computation scheme for map reduce based clusters
title_full_unstemmed	Low latency fast data computation scheme for map reduce based clusters
title_sort	low latency fast data computation scheme for map reduce based clusters
publishDate	2020
url	http://eprints.utm.my/id/eprint/98237/1/AishaShabbirPSC2020.pdf http://eprints.utm.my/id/eprint/98237/ http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:143970
_version_	1751536166904004608
score	13.209306

Low latency fast data computation scheme for map reduce based clusters

Similar Items