Improved Reinforcement-Based Profile Learning For Document Filtering

Today the amount of accessible information is overwhelming. A personalized information filtering system must be able to tailor to current interests of the user and to adapt as they change over time. This system has to monitor a stream of incoming documents to learn the user’s information requirement...

Full description

Saved in:
Bibliographic Details
Main Author: Mohammed Almurtadha, Yahya
Format: Thesis
Language:English
English
Published: 2007
Online Access:http://psasir.upm.edu.my/id/eprint/5211/1/FSKTM_2007_13.pdf
http://psasir.upm.edu.my/id/eprint/5211/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Today the amount of accessible information is overwhelming. A personalized information filtering system must be able to tailor to current interests of the user and to adapt as they change over time. This system has to monitor a stream of incoming documents to learn the user’s information requirements, which is the user profile. The research has proposed a content-based personal information system learns the user’s preferences by analyzing the document contents and building a user profile. This system is called RePLS; an agent-based Reinforcement Profile Learning System with adaptive information filtering. The research focuses on an improved terms weighting to measure the importance of the terms represent each profile called “purity term weighting”. The top selected terms are then used to filter the incoming documents to the learned user profiles. The agent approach is used because of its autonomous and adaptive capabilities to perform the filtering. The proposed method was evaluated and compared with three Information Filtering methods, namely Rocchio, Okapi/BSS Basic Search System and Reinf, the incremental profile learning method. Based on the proposed method, a profile learning system is developed using Microsoft VC++ connected to Microsoft Access database through an ODBC. AFC kit is used to implement the proposed agents under RETSINA architecture. The experiments are carried out on the TREC 2002 Filtering Track dataset provided by the National Institute of Standards and Technology (NIST). This research has proven that RePLS is able to filter the stream of incoming documents according to the user interests (profiles) learned by the proposed Purity term weighting method. Based on the experiments results, Purity weighting shows better terms weighting and profile learning than the other methods. The outcome of a considerably good accuracy is mainly due to the right weighting of the profile’s terms during the learning phase. This research opens a wide range of future works to be considered, including the investigation of the dependency between the selected terms for each profile, investigating the quality of the method on different datasets, and finally, the possibility to apply the proposed method in other area like the recommendation systems.