Evaluation and optimization of frequent, closed and maximal association rule based classification

Real world applications of association rule mining have well-known problems of discovering a large number of rules, many of which are not interesting or useful for the application at hand.The algorithms for closed and maximal item sets mining significantly reduce the volume of rules discovered and c...

Full description

Saved in:
Bibliographic Details
Main Authors: Mohd Shaharanee, Izwan Nizal, Hadzic, Fedja
Format: Article
Published: Springer US 2014
Subjects:
Online Access:http://repo.uum.edu.my/16645/
http://doi.org/10.1007/s11222-013-9404-6
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.uum.repo.16645
record_format eprints
spelling my.uum.repo.166452016-12-05T08:51:35Z http://repo.uum.edu.my/16645/ Evaluation and optimization of frequent, closed and maximal association rule based classification Mohd Shaharanee, Izwan Nizal Hadzic, Fedja QA Mathematics Real world applications of association rule mining have well-known problems of discovering a large number of rules, many of which are not interesting or useful for the application at hand.The algorithms for closed and maximal item sets mining significantly reduce the volume of rules discovered and complexity associated with the task, but the implications of their use and important differences with respect to the generalization power, precision and recall when used in the classification problem have not been examined.In this paper, we present a systematic evaluation of the association rules discovered from frequent, closed and maximal item set mining algorithms, combining common data mining and statistical interestingness measures, and outline an appropriate sequence of usage.The experiments are performed using a number of real-world datasets that represent diverse characteristics of data/items, and detailed evaluation of rule sets is provided as a whole and w.r.t individual classes. Empirical results confirm that with a proper combination of data mining and statistical analysis, a large number of non-significant, redundant and contradictive rules can be eliminated while preserving relatively high precision and recall. More importantly, the results reveal the important characteristics and differences between using frequent, closed and maximal itemsets for the classification task, and the effect of incorporating statistical/heuristic measures for optimizing such rule sets. With closed itemset mining already being a preferred choice for complexity and redundancy reduction during rule generation, this study has further confirmed that overall closed itemset based association rules are also of better quality in terms of classification precision and recall, and precision and recall on individual class examples. On the other hand maximal itemset based association rules, that are a subset of closed itemset based rules, show to be insufficient in this regard, and typically will have worse recall and generalization power. Empirical results also show the downfall of using the confidence measure at the start to generate association rules, as typically done within the association rule framework. Removing rules that occur below a certain confidence threshold, will also remove the knowledge of existence of any contradictions in the data to the relatively higher confidence rules, and thus precision can be increased by disregarding contradictive rules prior to application of confidence constraint. Springer US 2014 Article PeerReviewed Mohd Shaharanee, Izwan Nizal and Hadzic, Fedja (2014) Evaluation and optimization of frequent, closed and maximal association rule based classification. Statistics and Computing, 24 (5). pp. 821-843. ISSN 0960-3174 http://doi.org/10.1007/s11222-013-9404-6 doi:10.1007/s11222-013-9404-6
institution Universiti Utara Malaysia
building UUM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Utara Malaysia
content_source UUM Institutionali Repository
url_provider http://repo.uum.edu.my/
topic QA Mathematics
spellingShingle QA Mathematics
Mohd Shaharanee, Izwan Nizal
Hadzic, Fedja
Evaluation and optimization of frequent, closed and maximal association rule based classification
description Real world applications of association rule mining have well-known problems of discovering a large number of rules, many of which are not interesting or useful for the application at hand.The algorithms for closed and maximal item sets mining significantly reduce the volume of rules discovered and complexity associated with the task, but the implications of their use and important differences with respect to the generalization power, precision and recall when used in the classification problem have not been examined.In this paper, we present a systematic evaluation of the association rules discovered from frequent, closed and maximal item set mining algorithms, combining common data mining and statistical interestingness measures, and outline an appropriate sequence of usage.The experiments are performed using a number of real-world datasets that represent diverse characteristics of data/items, and detailed evaluation of rule sets is provided as a whole and w.r.t individual classes. Empirical results confirm that with a proper combination of data mining and statistical analysis, a large number of non-significant, redundant and contradictive rules can be eliminated while preserving relatively high precision and recall. More importantly, the results reveal the important characteristics and differences between using frequent, closed and maximal itemsets for the classification task, and the effect of incorporating statistical/heuristic measures for optimizing such rule sets. With closed itemset mining already being a preferred choice for complexity and redundancy reduction during rule generation, this study has further confirmed that overall closed itemset based association rules are also of better quality in terms of classification precision and recall, and precision and recall on individual class examples. On the other hand maximal itemset based association rules, that are a subset of closed itemset based rules, show to be insufficient in this regard, and typically will have worse recall and generalization power. Empirical results also show the downfall of using the confidence measure at the start to generate association rules, as typically done within the association rule framework. Removing rules that occur below a certain confidence threshold, will also remove the knowledge of existence of any contradictions in the data to the relatively higher confidence rules, and thus precision can be increased by disregarding contradictive rules prior to application of confidence constraint.
format Article
author Mohd Shaharanee, Izwan Nizal
Hadzic, Fedja
author_facet Mohd Shaharanee, Izwan Nizal
Hadzic, Fedja
author_sort Mohd Shaharanee, Izwan Nizal
title Evaluation and optimization of frequent, closed and maximal association rule based classification
title_short Evaluation and optimization of frequent, closed and maximal association rule based classification
title_full Evaluation and optimization of frequent, closed and maximal association rule based classification
title_fullStr Evaluation and optimization of frequent, closed and maximal association rule based classification
title_full_unstemmed Evaluation and optimization of frequent, closed and maximal association rule based classification
title_sort evaluation and optimization of frequent, closed and maximal association rule based classification
publisher Springer US
publishDate 2014
url http://repo.uum.edu.my/16645/
http://doi.org/10.1007/s11222-013-9404-6
_version_ 1644282026238410752
score 13.15806