Evaluation and optimization of frequent association rule based classification

Deriving useful and interesting rules from a data mining system is an essential and important task. Problems such as the discovery of random and coincidental patterns or patterns with no significant values, and the generation of a large volume of rules from a database commonly occur. Works on sust...

Full description

Saved in:
Bibliographic Details
Main Authors: Izwan Nizal Mohd Shaharanee,, Jastini Jamil,
Format: Article
Language:English
Published: Penerbit Universiti Kebangsaan Malaysia 2014
Online Access:http://journalarticle.ukm.my/6804/1/4801-11319-1-PB.pdf
http://journalarticle.ukm.my/6804/
http://ejournal.ukm.my/apjitm
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Deriving useful and interesting rules from a data mining system is an essential and important task. Problems such as the discovery of random and coincidental patterns or patterns with no significant values, and the generation of a large volume of rules from a database commonly occur. Works on sustaining the interestingness of rules generated by data mining algorithms are actively and constantly being examined and developed. In this paper, a systematic way to evaluate the association rules discovered from frequent itemset mining algorithms, combining common data mining and statistical interestingness measures, and outline an appropriated sequence of usage is presented. The experiments are performed using a number of real-world datasets that represent diverse characteristics of data/items, and detailed evaluation of rule sets is provided. Empirical results show that with a proper combination of data mining and statistical analysis, the framework is capable of eliminating a large number of non-significant, redundant and contradictive rules while preserving relatively valuable high accuracy and coverage rules when used in the classification problem. Moreover, the results reveal the important characteristics of mining frequent itemsets, and the impact of confidence measure for the classification task.