Description: A generic parallel processing model for facilitating data mining and integration

A generic parallel processing model for facilitating data mining and integration

To facilitate data mining and integration (DMI) processes in a generic way, we investigate a parallel pipeline streaming model. We model a DMI task as a streaming data-flow graph: a directed acyclic graph (DAG) of Processing Elements (PEs). The composition mechanism links PEs via data streams, which...

Full description

Saved in:

Bibliographic Details
Main Authors:	Han, L.X., Liew, C.S., van Hemert, J., Atkinson, M.
Format:	Article
Published:	Elsevier 2011
Subjects:	QA75 Electronic computers. Computer science
Online Access:	http://eprints.um.edu.my/2071/
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	To facilitate data mining and integration (DMI) processes in a generic way, we investigate a parallel pipeline streaming model. We model a DMI task as a streaming data-flow graph: a directed acyclic graph (DAG) of Processing Elements (PEs). The composition mechanism links PEs via data streams, which may be in memory, buffered via disks or inter-computer data-flows. This makes it possible to build arbitrary DAGs with pipelining and both data and task parallelisms, which provide room for performance enhancement. We have applied this approach to a real DMI case in the life sciences and implemented a prototype. To demonstrate feasibility of the modelled DMI task and assess the efficiency of the prototype, we have also built a performance evaluation model. The experimental evaluation results show that a linear speedup has been achieved with the increase of the number of distributed computing nodes in this case study.

A generic parallel processing model for facilitating data mining and integration

Similar Items