Staff View: A new ant based rule extraction algorithm for web classification

A new ant based rule extraction algorithm for web classification

Methods to reduce the number of attributes and discretization are two important data pre-processing steps before the data can be used for classification activity. Web documents contain enormous number of attributes as compared to other type of data. Ant-Miner algorithm is also still lacking in effic...

Full description

Saved in:

Bibliographic Details
Main Authors:	Ku-Mahamud, Ku Ruhana, Saian, Rizauddin
Format:	Monograph
Language:	English English
Published:	Universiti Utara Malaysia 2011
Subjects:	QA76 Computer software
Online Access:	http://repo.uum.edu.my/8136/1/Ku.pdf http://repo.uum.edu.my/8136/3/1.KU%20RUHANA%20KU%20MAHAMUD.pdf http://repo.uum.edu.my/8136/ http://lintas.uum.edu.my:8080/elmu/index.jsp?module=webopac-l&action=fullDisplayRetriever.jsp&szMaterialNo=0000http://lintas.uum.edu.my:8080/elmu/index.jsp?module=webopac-l&action=fullDisplayRetriever.jsp&szMaterialNo=0000780133
Tags:	Add Tag No Tags, Be the first to tag this record!

id	my.uum.repo.8136
record_format	eprints
spelling	my.uum.repo.81362014-07-06T04:34:23Z http://repo.uum.edu.my/8136/ A new ant based rule extraction algorithm for web classification Ku-Mahamud, Ku Ruhana Saian, Rizauddin QA76 Computer software Methods to reduce the number of attributes and discretization are two important data pre-processing steps before the data can be used for classification activity. Web documents contain enormous number of attributes as compared to other type of data. Ant-Miner algorithm is also still lacking in efficiency, accuracy and rule simplicity because of the local minima problem.Therefore, the Ant-Miner algorithm needs to be improved by taking into consideration of the accuracy and rule simplicity criteria so that it could be used to classify Web documents data sets or any large data sets.The best attribute selection method for Web texts categorization is the combination of correlation-based evaluation with random search as the search method.However, this attribute selection method will not give the best performance in attributes reduction. Using Classifier-based attribute subset selection will reduce more attributes, but sacrifice the performance of the classifier.A hybrid ant colony optimization with simulated annealing algorithm to discover rules from data is proposed.The simulated annealing technique will minimize the problem of low quality discovered rule by an ant in a colony.The best rule for a colony will then be chosen and later the best rule among the colonies will be included in the rule set.The best rule for a colony will then be chosen and later the best rule among the colonies will be included in the rule set.The rule set is arranged in decreasing order of generation.Thirteen data sets which consist of discrete and continuous data were used to evaluate the performance of the proposed algorithm in terms of accuracy, number of rules and number of terms in the rules.Experimental results obtained from the proposed algorithm are comparable to the results of the Ant-Miner algorithm in terms of rule accuracy but are better in terms of rule simplicity. Universiti Utara Malaysia 2011 Monograph NonPeerReviewed application/pdf en http://repo.uum.edu.my/8136/1/Ku.pdf application/pdf en http://repo.uum.edu.my/8136/3/1.KU%20RUHANA%20KU%20MAHAMUD.pdf Ku-Mahamud, Ku Ruhana and Saian, Rizauddin (2011) A new ant based rule extraction algorithm for web classification. Project Report. Universiti Utara Malaysia. (Unpublished) http://lintas.uum.edu.my:8080/elmu/index.jsp?module=webopac-l&action=fullDisplayRetriever.jsp&szMaterialNo=0000http://lintas.uum.edu.my:8080/elmu/index.jsp?module=webopac-l&action=fullDisplayRetriever.jsp&szMaterialNo=0000780133
institution	Universiti Utara Malaysia
building	UUM Library
collection	Institutional Repository
continent	Asia
country	Malaysia
content_provider	Universiti Utara Malaysia
content_source	UUM Institutionali Repository
url_provider	http://repo.uum.edu.my/
language	English English
topic	QA76 Computer software
spellingShingle	QA76 Computer software Ku-Mahamud, Ku Ruhana Saian, Rizauddin A new ant based rule extraction algorithm for web classification
description	Methods to reduce the number of attributes and discretization are two important data pre-processing steps before the data can be used for classification activity. Web documents contain enormous number of attributes as compared to other type of data. Ant-Miner algorithm is also still lacking in efficiency, accuracy and rule simplicity because of the local minima problem.Therefore, the Ant-Miner algorithm needs to be improved by taking into consideration of the accuracy and rule simplicity criteria so that it could be used to classify Web documents data sets or any large data sets.The best attribute selection method for Web texts categorization is the combination of correlation-based evaluation with random search as the search method.However, this attribute selection method will not give the best performance in attributes reduction. Using Classifier-based attribute subset selection will reduce more attributes, but sacrifice the performance of the classifier.A hybrid ant colony optimization with simulated annealing algorithm to discover rules from data is proposed.The simulated annealing technique will minimize the problem of low quality discovered rule by an ant in a colony.The best rule for a colony will then be chosen and later the best rule among the colonies will be included in the rule set.The best rule for a colony will then be chosen and later the best rule among the colonies will be included in the rule set.The rule set is arranged in decreasing order of generation.Thirteen data sets which consist of discrete and continuous data were used to evaluate the performance of the proposed algorithm in terms of accuracy, number of rules and number of terms in the rules.Experimental results obtained from the proposed algorithm are comparable to the results of the Ant-Miner algorithm in terms of rule accuracy but are better in terms of rule simplicity.
format	Monograph
author	Ku-Mahamud, Ku Ruhana Saian, Rizauddin
author_facet	Ku-Mahamud, Ku Ruhana Saian, Rizauddin
author_sort	Ku-Mahamud, Ku Ruhana
title	A new ant based rule extraction algorithm for web classification
title_short	A new ant based rule extraction algorithm for web classification
title_full	A new ant based rule extraction algorithm for web classification
title_fullStr	A new ant based rule extraction algorithm for web classification
title_full_unstemmed	A new ant based rule extraction algorithm for web classification
title_sort	new ant based rule extraction algorithm for web classification
publisher	Universiti Utara Malaysia
publishDate	2011
url	http://repo.uum.edu.my/8136/1/Ku.pdf http://repo.uum.edu.my/8136/3/1.KU%20RUHANA%20KU%20MAHAMUD.pdf http://repo.uum.edu.my/8136/ http://lintas.uum.edu.my:8080/elmu/index.jsp?module=webopac-l&action=fullDisplayRetriever.jsp&szMaterialNo=0000http://lintas.uum.edu.my:8080/elmu/index.jsp?module=webopac-l&action=fullDisplayRetriever.jsp&szMaterialNo=0000780133
_version_	1644279744689078272
score	13.18916

A new ant based rule extraction algorithm for web classification

Similar Items