A new ant based rule extraction algorithm for web classification

Methods to reduce the number of attributes and discretization are two important data pre-processing steps before the data can be used for classification activity. Web documents contain enormous number of attributes as compared to other type of data. Ant-Miner algorithm is also still lacking in effic...

Full description

Saved in:
Bibliographic Details
Main Authors: Ku-Mahamud, Ku Ruhana, Saian, Rizauddin
Format: Monograph
Language:English
English
Published: Universiti Utara Malaysia 2011
Subjects:
Online Access:http://repo.uum.edu.my/8136/1/Ku.pdf
http://repo.uum.edu.my/8136/3/1.KU%20RUHANA%20KU%20MAHAMUD.pdf
http://repo.uum.edu.my/8136/
http://lintas.uum.edu.my:8080/elmu/index.jsp?module=webopac-l&action=fullDisplayRetriever.jsp&szMaterialNo=0000http://lintas.uum.edu.my:8080/elmu/index.jsp?module=webopac-l&action=fullDisplayRetriever.jsp&szMaterialNo=0000780133
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.uum.repo.8136
record_format eprints
spelling my.uum.repo.81362014-07-06T04:34:23Z http://repo.uum.edu.my/8136/ A new ant based rule extraction algorithm for web classification Ku-Mahamud, Ku Ruhana Saian, Rizauddin QA76 Computer software Methods to reduce the number of attributes and discretization are two important data pre-processing steps before the data can be used for classification activity. Web documents contain enormous number of attributes as compared to other type of data. Ant-Miner algorithm is also still lacking in efficiency, accuracy and rule simplicity because of the local minima problem.Therefore, the Ant-Miner algorithm needs to be improved by taking into consideration of the accuracy and rule simplicity criteria so that it could be used to classify Web documents data sets or any large data sets.The best attribute selection method for Web texts categorization is the combination of correlation-based evaluation with random search as the search method.However, this attribute selection method will not give the best performance in attributes reduction. Using Classifier-based attribute subset selection will reduce more attributes, but sacrifice the performance of the classifier.A hybrid ant colony optimization with simulated annealing algorithm to discover rules from data is proposed.The simulated annealing technique will minimize the problem of low quality discovered rule by an ant in a colony.The best rule for a colony will then be chosen and later the best rule among the colonies will be included in the rule set.The best rule for a colony will then be chosen and later the best rule among the colonies will be included in the rule set.The rule set is arranged in decreasing order of generation.Thirteen data sets which consist of discrete and continuous data were used to evaluate the performance of the proposed algorithm in terms of accuracy, number of rules and number of terms in the rules.Experimental results obtained from the proposed algorithm are comparable to the results of the Ant-Miner algorithm in terms of rule accuracy but are better in terms of rule simplicity. Universiti Utara Malaysia 2011 Monograph NonPeerReviewed application/pdf en http://repo.uum.edu.my/8136/1/Ku.pdf application/pdf en http://repo.uum.edu.my/8136/3/1.KU%20RUHANA%20KU%20MAHAMUD.pdf Ku-Mahamud, Ku Ruhana and Saian, Rizauddin (2011) A new ant based rule extraction algorithm for web classification. Project Report. Universiti Utara Malaysia. (Unpublished) http://lintas.uum.edu.my:8080/elmu/index.jsp?module=webopac-l&action=fullDisplayRetriever.jsp&szMaterialNo=0000http://lintas.uum.edu.my:8080/elmu/index.jsp?module=webopac-l&action=fullDisplayRetriever.jsp&szMaterialNo=0000780133
institution Universiti Utara Malaysia
building UUM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Utara Malaysia
content_source UUM Institutionali Repository
url_provider http://repo.uum.edu.my/
language English
English
topic QA76 Computer software
spellingShingle QA76 Computer software
Ku-Mahamud, Ku Ruhana
Saian, Rizauddin
A new ant based rule extraction algorithm for web classification
description Methods to reduce the number of attributes and discretization are two important data pre-processing steps before the data can be used for classification activity. Web documents contain enormous number of attributes as compared to other type of data. Ant-Miner algorithm is also still lacking in efficiency, accuracy and rule simplicity because of the local minima problem.Therefore, the Ant-Miner algorithm needs to be improved by taking into consideration of the accuracy and rule simplicity criteria so that it could be used to classify Web documents data sets or any large data sets.The best attribute selection method for Web texts categorization is the combination of correlation-based evaluation with random search as the search method.However, this attribute selection method will not give the best performance in attributes reduction. Using Classifier-based attribute subset selection will reduce more attributes, but sacrifice the performance of the classifier.A hybrid ant colony optimization with simulated annealing algorithm to discover rules from data is proposed.The simulated annealing technique will minimize the problem of low quality discovered rule by an ant in a colony.The best rule for a colony will then be chosen and later the best rule among the colonies will be included in the rule set.The best rule for a colony will then be chosen and later the best rule among the colonies will be included in the rule set.The rule set is arranged in decreasing order of generation.Thirteen data sets which consist of discrete and continuous data were used to evaluate the performance of the proposed algorithm in terms of accuracy, number of rules and number of terms in the rules.Experimental results obtained from the proposed algorithm are comparable to the results of the Ant-Miner algorithm in terms of rule accuracy but are better in terms of rule simplicity.
format Monograph
author Ku-Mahamud, Ku Ruhana
Saian, Rizauddin
author_facet Ku-Mahamud, Ku Ruhana
Saian, Rizauddin
author_sort Ku-Mahamud, Ku Ruhana
title A new ant based rule extraction algorithm for web classification
title_short A new ant based rule extraction algorithm for web classification
title_full A new ant based rule extraction algorithm for web classification
title_fullStr A new ant based rule extraction algorithm for web classification
title_full_unstemmed A new ant based rule extraction algorithm for web classification
title_sort new ant based rule extraction algorithm for web classification
publisher Universiti Utara Malaysia
publishDate 2011
url http://repo.uum.edu.my/8136/1/Ku.pdf
http://repo.uum.edu.my/8136/3/1.KU%20RUHANA%20KU%20MAHAMUD.pdf
http://repo.uum.edu.my/8136/
http://lintas.uum.edu.my:8080/elmu/index.jsp?module=webopac-l&action=fullDisplayRetriever.jsp&szMaterialNo=0000http://lintas.uum.edu.my:8080/elmu/index.jsp?module=webopac-l&action=fullDisplayRetriever.jsp&szMaterialNo=0000780133
_version_ 1644279744689078272
score 13.149126