Staff View: Direct approach for mining association rules from structured XML data

Direct approach for mining association rules from structured XML data

XML has become the standard for data representation on the internet. This expansion in reputation has prompt the need for a technique to access XML documents for particular information and to manipulate repositories of documents represented in XML to find specific documents. Having the ability to ex...

Full description

Saved in:

Bibliographic Details
Main Author:	Abazeed, Ashraf Riad
Format:	Thesis
Language:	English
Published:	2012
Online Access:	http://psasir.upm.edu.my/id/eprint/27118/1/FSKTM%202012%2021R.pdf http://psasir.upm.edu.my/id/eprint/27118/
Tags:	Add Tag No Tags, Be the first to tag this record!

id	my.upm.eprints.27118
record_format	eprints
spelling	my.upm.eprints.271182017-05-12T04:42:22Z http://psasir.upm.edu.my/id/eprint/27118/ Direct approach for mining association rules from structured XML data Abazeed, Ashraf Riad XML has become the standard for data representation on the internet. This expansion in reputation has prompt the need for a technique to access XML documents for particular information and to manipulate repositories of documents represented in XML to find specific documents. Having the ability to extract information from XML data would answer the problem of mining the web contents which is a very useful and required power nowadays. Efforts are made to develop a new tool or method for extracting information from XML data directly without any preprocessing or post processing of the XML documents. Association rules express the probability of the existing of a set of items when another set of items exists. It searches for similarities among large database. “Web mining” refer to how we can apply the traditional mining techniques that works on relational data and bind it to new data input represented in XML data which might be semi structure or unstructured. There are several techniques to mine association rules from XML data. The basic approach is to map the XML documents to relational data model and to store them in a relational database. This allows us to apply the standard tools that are in use to perform rule mining from relational databases. Even though it makes use of the existing technology, this approach is often time consuming and involves manual intervention because of the mapping process. The focus of this study is to propose an enhancement on memory consumption by reducing the number of candidates generated for the existing FLEX algorithm which will reduce the amount of memory needed to execute the algorithm. Another aim of this study is to do an enhancement on the current structure of FLEX algorithm in terms of elimination of the candidate generation step. The thesis also provides a two different implementation of the modified FLEX algorithm using a java based parsers and XQuery implementation. The thesis outlines the two different implementation techniques of the existing FLEX algorithm using java based parsers and using a query language for XML. The implementation details shows the difference in accessing and manipulating XML v documents using java based parsers and query languages for XML and the steps needed to access an XML document until we produce a list of association rules . The proposed method, XiFLEX has been implemented using two different techniques (java based & XQuery) and compared with the original FLEX algorithm in its basic implementation and the Apriori algorithm for frequent patterns generation. The experiments were conducted on self generated data sets (7 different sets) and well known datasets (Mushroom & Cars Data set). The results have shows that the proposed method, XiFLEX, has a better performance in terms of the time it takes to generate frequent patterns and the number of candidates generated (memory consumption). 2012-01 Thesis NonPeerReviewed application/pdf en http://psasir.upm.edu.my/id/eprint/27118/1/FSKTM%202012%2021R.pdf Abazeed, Ashraf Riad (2012) Direct approach for mining association rules from structured XML data. PhD thesis, Universiti Putra Malaysia.
institution	Universiti Putra Malaysia
building	UPM Library
collection	Institutional Repository
continent	Asia
country	Malaysia
content_provider	Universiti Putra Malaysia
content_source	UPM Institutional Repository
url_provider	http://psasir.upm.edu.my/
language	English
description	XML has become the standard for data representation on the internet. This expansion in reputation has prompt the need for a technique to access XML documents for particular information and to manipulate repositories of documents represented in XML to find specific documents. Having the ability to extract information from XML data would answer the problem of mining the web contents which is a very useful and required power nowadays. Efforts are made to develop a new tool or method for extracting information from XML data directly without any preprocessing or post processing of the XML documents. Association rules express the probability of the existing of a set of items when another set of items exists. It searches for similarities among large database. “Web mining” refer to how we can apply the traditional mining techniques that works on relational data and bind it to new data input represented in XML data which might be semi structure or unstructured. There are several techniques to mine association rules from XML data. The basic approach is to map the XML documents to relational data model and to store them in a relational database. This allows us to apply the standard tools that are in use to perform rule mining from relational databases. Even though it makes use of the existing technology, this approach is often time consuming and involves manual intervention because of the mapping process. The focus of this study is to propose an enhancement on memory consumption by reducing the number of candidates generated for the existing FLEX algorithm which will reduce the amount of memory needed to execute the algorithm. Another aim of this study is to do an enhancement on the current structure of FLEX algorithm in terms of elimination of the candidate generation step. The thesis also provides a two different implementation of the modified FLEX algorithm using a java based parsers and XQuery implementation. The thesis outlines the two different implementation techniques of the existing FLEX algorithm using java based parsers and using a query language for XML. The implementation details shows the difference in accessing and manipulating XML v documents using java based parsers and query languages for XML and the steps needed to access an XML document until we produce a list of association rules . The proposed method, XiFLEX has been implemented using two different techniques (java based & XQuery) and compared with the original FLEX algorithm in its basic implementation and the Apriori algorithm for frequent patterns generation. The experiments were conducted on self generated data sets (7 different sets) and well known datasets (Mushroom & Cars Data set). The results have shows that the proposed method, XiFLEX, has a better performance in terms of the time it takes to generate frequent patterns and the number of candidates generated (memory consumption).
format	Thesis
author	Abazeed, Ashraf Riad
spellingShingle	Abazeed, Ashraf Riad Direct approach for mining association rules from structured XML data
author_facet	Abazeed, Ashraf Riad
author_sort	Abazeed, Ashraf Riad
title	Direct approach for mining association rules from structured XML data
title_short	Direct approach for mining association rules from structured XML data
title_full	Direct approach for mining association rules from structured XML data
title_fullStr	Direct approach for mining association rules from structured XML data
title_full_unstemmed	Direct approach for mining association rules from structured XML data
title_sort	direct approach for mining association rules from structured xml data
publishDate	2012
url	http://psasir.upm.edu.my/id/eprint/27118/1/FSKTM%202012%2021R.pdf http://psasir.upm.edu.my/id/eprint/27118/
_version_	1643829091172876288
score	13.214268

Direct approach for mining association rules from structured XML data

Similar Items