Development of compound clustering techniques using hybrid soft-computing algorithms
Databases of molecular structures available to the pharmaceutical industry comprise millions of molecules. With the advent of combinatorial chemistry, a vast number of compounds can be available either physically or virtually, which can make screening all of them infeasible in terms of time and cost...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Monograph |
Language: | English |
Published: |
Faculty of Computer Science and Information System
2006
|
Subjects: | |
Online Access: | http://eprints.utm.my/id/eprint/4139/1/74252.pdf http://eprints.utm.my/id/eprint/4139/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
id |
my.utm.4139 |
---|---|
record_format |
eprints |
institution |
Universiti Teknologi Malaysia |
building |
UTM Library |
collection |
Institutional Repository |
continent |
Asia |
country |
Malaysia |
content_provider |
Universiti Teknologi Malaysia |
content_source |
UTM Institutional Repository |
url_provider |
http://eprints.utm.my/ |
language |
English |
topic |
T Technology (General) |
spellingShingle |
T Technology (General) Salim, Naomie Shamsuddin, Siti Mariyam Salleh @ Sallehuddin, Roselina Alwee, Razana Development of compound clustering techniques using hybrid soft-computing algorithms |
description |
Databases of molecular structures available to the pharmaceutical industry comprise millions of molecules. With the advent of combinatorial chemistry, a vast number of compounds can be available either physically or virtually, which can make screening all of them infeasible in terms of time and cost. Therefore, only a subset of the entire database that encompasses the full range of structural types of the underlying dataset needs to be selected for screening to maximise the likelihood of finding as many biologically distinct active compounds as possible in a screening experiment. One of most used compound selection method is cluster-based compound selection, which involves subdividing a set of compounds into clusters and choosing one compound or a small number of compounds from each cluster. Selecting only representative compounds from each cluster is based on the assumption that structurally similar molecules have similar properties. A good clustering method groups similar compounds together, to ensure all activity classes are represented, whilst separating active and inactive compounds into different sets of clusters, to avoid an inactive compound being selected as a cluster representative. Hierarchical clustering methods such as Ward’s and Group Average are considered industry standard for compound selection purposes. Previously, there is limited work on the clustering and classification of biologically active compounds into their activity based classes using fuzzy and neural network. Furthermore, it has been found that many of the biologically active molecular structures exhibit more than one activity in which case they can be used as drugs for the treatment of more than one disease. However, previous clustering methods on chemical compounds are mostly limited to hard partitioning, which allows a compound to belong to only one cluster. In this work, neural, fuzzy and hybrid methods are utilized for the clustering of biologically active molecular structures into their corresponding activity classes. The methods have been evaluated for their performance on MDL’s MDDR, NCI’s AIDS and IDDB drug databases containing various biologically active classes of molecular structures. The neural network methods use a number of heuristics to find appropriate parametric values. Initially, the heuristics needs user intervention to select optimal values, which give poor results. To overcome this problem, fuzzy memberships have been employed to find optimal parameters. Since fuzzy clustering methods such as the fuzzy c-means and fuzzy G – K are computationally exhaustive in terms of time and memory requirements, a hierarchical approach have also been used in this work for their implementation. The hierarchical fuzzy clustering algorithm developed in this work assign the overlapping structures (structures having more than one activity) to more than one clusters if their fuzzy membership values are significantly high for those clusters. When compared with industry standard methods, the neural networks show very poor performance when 2-D bit-strings descriptors are used. However, their relative performance improves when used with topological indices as descriptors. The fuzzy and fuzzy neural methods show slightly better results than the industry standard methods. The hierarchical fuzzy clustering method developed here is far better than a similar implementation of the hard k-means method. When used for overlapping structures, its performance improves significantly. Although the neural network methods are not very effective in clustering biologically active structures, their performance is remarkable when used as classifiers. The feed forward and radial basis functions networks show higher learning capabilities than support vector machines and rough set classifier in the classification of datasets comprising more than two classes. However, their performance is slightly inferior to that of support vector machines for binary classification of chemical structures into drug and non drug compounds. |
format |
Monograph |
author |
Salim, Naomie Shamsuddin, Siti Mariyam Salleh @ Sallehuddin, Roselina Alwee, Razana |
author_facet |
Salim, Naomie Shamsuddin, Siti Mariyam Salleh @ Sallehuddin, Roselina Alwee, Razana |
author_sort |
Salim, Naomie |
title |
Development of compound clustering techniques using hybrid soft-computing algorithms |
title_short |
Development of compound clustering techniques using hybrid soft-computing algorithms |
title_full |
Development of compound clustering techniques using hybrid soft-computing algorithms |
title_fullStr |
Development of compound clustering techniques using hybrid soft-computing algorithms |
title_full_unstemmed |
Development of compound clustering techniques using hybrid soft-computing algorithms |
title_sort |
development of compound clustering techniques using hybrid soft-computing algorithms |
publisher |
Faculty of Computer Science and Information System |
publishDate |
2006 |
url |
http://eprints.utm.my/id/eprint/4139/1/74252.pdf http://eprints.utm.my/id/eprint/4139/ |
_version_ |
1643643977229926400 |
spelling |
my.utm.41392010-06-01T03:15:04Z http://eprints.utm.my/id/eprint/4139/ Development of compound clustering techniques using hybrid soft-computing algorithms Salim, Naomie Shamsuddin, Siti Mariyam Salleh @ Sallehuddin, Roselina Alwee, Razana T Technology (General) Databases of molecular structures available to the pharmaceutical industry comprise millions of molecules. With the advent of combinatorial chemistry, a vast number of compounds can be available either physically or virtually, which can make screening all of them infeasible in terms of time and cost. Therefore, only a subset of the entire database that encompasses the full range of structural types of the underlying dataset needs to be selected for screening to maximise the likelihood of finding as many biologically distinct active compounds as possible in a screening experiment. One of most used compound selection method is cluster-based compound selection, which involves subdividing a set of compounds into clusters and choosing one compound or a small number of compounds from each cluster. Selecting only representative compounds from each cluster is based on the assumption that structurally similar molecules have similar properties. A good clustering method groups similar compounds together, to ensure all activity classes are represented, whilst separating active and inactive compounds into different sets of clusters, to avoid an inactive compound being selected as a cluster representative. Hierarchical clustering methods such as Ward’s and Group Average are considered industry standard for compound selection purposes. Previously, there is limited work on the clustering and classification of biologically active compounds into their activity based classes using fuzzy and neural network. Furthermore, it has been found that many of the biologically active molecular structures exhibit more than one activity in which case they can be used as drugs for the treatment of more than one disease. However, previous clustering methods on chemical compounds are mostly limited to hard partitioning, which allows a compound to belong to only one cluster. In this work, neural, fuzzy and hybrid methods are utilized for the clustering of biologically active molecular structures into their corresponding activity classes. The methods have been evaluated for their performance on MDL’s MDDR, NCI’s AIDS and IDDB drug databases containing various biologically active classes of molecular structures. The neural network methods use a number of heuristics to find appropriate parametric values. Initially, the heuristics needs user intervention to select optimal values, which give poor results. To overcome this problem, fuzzy memberships have been employed to find optimal parameters. Since fuzzy clustering methods such as the fuzzy c-means and fuzzy G – K are computationally exhaustive in terms of time and memory requirements, a hierarchical approach have also been used in this work for their implementation. The hierarchical fuzzy clustering algorithm developed in this work assign the overlapping structures (structures having more than one activity) to more than one clusters if their fuzzy membership values are significantly high for those clusters. When compared with industry standard methods, the neural networks show very poor performance when 2-D bit-strings descriptors are used. However, their relative performance improves when used with topological indices as descriptors. The fuzzy and fuzzy neural methods show slightly better results than the industry standard methods. The hierarchical fuzzy clustering method developed here is far better than a similar implementation of the hard k-means method. When used for overlapping structures, its performance improves significantly. Although the neural network methods are not very effective in clustering biologically active structures, their performance is remarkable when used as classifiers. The feed forward and radial basis functions networks show higher learning capabilities than support vector machines and rough set classifier in the classification of datasets comprising more than two classes. However, their performance is slightly inferior to that of support vector machines for binary classification of chemical structures into drug and non drug compounds. Faculty of Computer Science and Information System 2006-10-31 Monograph NonPeerReviewed application/pdf en http://eprints.utm.my/id/eprint/4139/1/74252.pdf Salim, Naomie and Shamsuddin, Siti Mariyam and Salleh @ Sallehuddin, Roselina and Alwee, Razana (2006) Development of compound clustering techniques using hybrid soft-computing algorithms. Project Report. Faculty of Computer Science and Information System, Skudai, Johor. (Unpublished) |
score |
13.209306 |