Text this: Topic identification using filtering and rule generation algorithm for textual document