Text this: A comparative study and performance evaluation of similarity measures for data clustering