A generic parallel processing model for facilitating data mining and integration

To facilitate data mining and integration (DMI) processes in a generic way, we investigate a parallel pipeline streaming model. We model a DMI task as a streaming data-flow graph: a directed acyclic graph (DAG) of Processing Elements (PEs). The composition mechanism links PEs via data streams, which...

Full description

Saved in:
Bibliographic Details
Main Authors: Han, L.X., Liew, C.S., van Hemert, J., Atkinson, M.
Format: Article
Published: Elsevier 2011
Subjects:
Online Access:http://eprints.um.edu.my/2071/
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.um.eprints.2071
record_format eprints
spelling my.um.eprints.20712014-12-26T02:22:49Z http://eprints.um.edu.my/2071/ A generic parallel processing model for facilitating data mining and integration Han, L.X. Liew, C.S. van Hemert, J. Atkinson, M. QA75 Electronic computers. Computer science To facilitate data mining and integration (DMI) processes in a generic way, we investigate a parallel pipeline streaming model. We model a DMI task as a streaming data-flow graph: a directed acyclic graph (DAG) of Processing Elements (PEs). The composition mechanism links PEs via data streams, which may be in memory, buffered via disks or inter-computer data-flows. This makes it possible to build arbitrary DAGs with pipelining and both data and task parallelisms, which provide room for performance enhancement. We have applied this approach to a real DMI case in the life sciences and implemented a prototype. To demonstrate feasibility of the modelled DMI task and assess the efficiency of the prototype, we have also built a performance evaluation model. The experimental evaluation results show that a linear speedup has been achieved with the increase of the number of distributed computing nodes in this case study. Elsevier 2011-03 Article PeerReviewed Han, L.X. and Liew, C.S. and van Hemert, J. and Atkinson, M. (2011) A generic parallel processing model for facilitating data mining and integration. Parallel Computing, 37 (3). pp. 157-171. ISSN 0167-8191
institution Universiti Malaya
building UM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Malaya
content_source UM Research Repository
url_provider http://eprints.um.edu.my/
topic QA75 Electronic computers. Computer science
spellingShingle QA75 Electronic computers. Computer science
Han, L.X.
Liew, C.S.
van Hemert, J.
Atkinson, M.
A generic parallel processing model for facilitating data mining and integration
description To facilitate data mining and integration (DMI) processes in a generic way, we investigate a parallel pipeline streaming model. We model a DMI task as a streaming data-flow graph: a directed acyclic graph (DAG) of Processing Elements (PEs). The composition mechanism links PEs via data streams, which may be in memory, buffered via disks or inter-computer data-flows. This makes it possible to build arbitrary DAGs with pipelining and both data and task parallelisms, which provide room for performance enhancement. We have applied this approach to a real DMI case in the life sciences and implemented a prototype. To demonstrate feasibility of the modelled DMI task and assess the efficiency of the prototype, we have also built a performance evaluation model. The experimental evaluation results show that a linear speedup has been achieved with the increase of the number of distributed computing nodes in this case study.
format Article
author Han, L.X.
Liew, C.S.
van Hemert, J.
Atkinson, M.
author_facet Han, L.X.
Liew, C.S.
van Hemert, J.
Atkinson, M.
author_sort Han, L.X.
title A generic parallel processing model for facilitating data mining and integration
title_short A generic parallel processing model for facilitating data mining and integration
title_full A generic parallel processing model for facilitating data mining and integration
title_fullStr A generic parallel processing model for facilitating data mining and integration
title_full_unstemmed A generic parallel processing model for facilitating data mining and integration
title_sort generic parallel processing model for facilitating data mining and integration
publisher Elsevier
publishDate 2011
url http://eprints.um.edu.my/2071/
_version_ 1643686836163313664
score 13.214268