Staff View: Development of compound clustering techniques using hybrid soft-computing algorithms

Development of compound clustering techniques using hybrid soft-computing algorithms

Databases of molecular structures available to the pharmaceutical industry comprise millions of molecules. With the advent of combinatorial chemistry, a vast number of compounds can be available either physically or virtually, which can make screening all of them infeasible in terms of time and cost...

Full description

Saved in:

Bibliographic Details
Main Authors:	Salim, Naomie, Shamsuddin, Siti Mariyam, Salleh @ Sallehuddin, Roselina, Alwee, Razana
Format:	Monograph
Language:	English
Published:	Faculty of Computer Science and Information System 2006
Subjects:	T Technology (General)
Online Access:	http://eprints.utm.my/id/eprint/4139/1/74252.pdf http://eprints.utm.my/id/eprint/4139/
Tags:	Add Tag No Tags, Be the first to tag this record!

id	my.utm.4139
record_format	eprints
institution	Universiti Teknologi Malaysia
building	UTM Library
collection	Institutional Repository
continent	Asia
country	Malaysia
content_provider	Universiti Teknologi Malaysia
content_source	UTM Institutional Repository
url_provider	http://eprints.utm.my/
language	English
topic	T Technology (General)
spellingShingle	T Technology (General) Salim, Naomie Shamsuddin, Siti Mariyam Salleh @ Sallehuddin, Roselina Alwee, Razana Development of compound clustering techniques using hybrid soft-computing algorithms
description	Databases of molecular structures available to the pharmaceutical industry comprise millions of molecules. With the advent of combinatorial chemistry, a vast number of compounds can be available either physically or virtually, which can make screening all of them infeasible in terms of time and cost. Therefore, only a subset of the entire database that encompasses the full range of structural types of the underlying dataset needs to be selected for screening to maximise the likelihood of finding as many biologically distinct active compounds as possible in a screening experiment. One of most used compound selection method is cluster-based compound selection, which involves subdividing a set of compounds into clusters and choosing one compound or a small number of compounds from each cluster. Selecting only representative compounds from each cluster is based on the assumption that structurally similar molecules have similar properties. A good clustering method groups similar compounds together, to ensure all activity classes are represented, whilst separating active and inactive compounds into different sets of clusters, to avoid an inactive compound being selected as a cluster representative. Hierarchical clustering methods such as Wardâ€™s and Group Average are considered industry standard for compound selection purposes. Previously, there is limited work on the clustering and classification of biologically active compounds into their activity based classes using fuzzy and neural network. Furthermore, it has been found that many of the biologically active molecular structures exhibit more than one activity in which case they can be used as drugs for the treatment of more than one disease. However, previous clustering methods on chemical compounds are mostly limited to hard partitioning, which allows a compound to belong to only one cluster. In this work, neural, fuzzy and hybrid methods are utilized for the clustering of biologically active molecular structures into their corresponding activity classes. The methods have been evaluated for their performance on MDLâ€™s MDDR, NCIâ€™s AIDS and IDDB drug databases containing various biologically active classes of molecular structures. The neural network methods use a number of heuristics to find appropriate parametric values. Initially, the heuristics needs user intervention to select optimal values, which give poor results. To overcome this problem, fuzzy memberships have been employed to find optimal parameters. Since fuzzy clustering methods such as the fuzzy c-means and fuzzy G â€“ K are computationally exhaustive in terms of time and memory requirements, a hierarchical approach have also been used in this work for their implementation. The hierarchical fuzzy clustering algorithm developed in this work assign the overlapping structures (structures having more than one activity) to more than one clusters if their fuzzy membership values are significantly high for those clusters. When compared with industry standard methods, the neural networks show very poor performance when 2-D bit-strings descriptors are used. However, their relative performance improves when used with topological indices as descriptors. The fuzzy and fuzzy neural methods show slightly better results than the industry standard methods. The hierarchical fuzzy clustering method developed here is far better than a similar implementation of the hard k-means method. When used for overlapping structures, its performance improves significantly. Although the neural network methods are not very effective in clustering biologically active structures, their performance is remarkable when used as classifiers. The feed forward and radial basis functions networks show higher learning capabilities than support vector machines and rough set classifier in the classification of datasets comprising more than two classes. However, their performance is slightly inferior to that of support vector machines for binary classification of chemical structures into drug and non drug compounds.
format	Monograph
author	Salim, Naomie Shamsuddin, Siti Mariyam Salleh @ Sallehuddin, Roselina Alwee, Razana
author_facet	Salim, Naomie Shamsuddin, Siti Mariyam Salleh @ Sallehuddin, Roselina Alwee, Razana
author_sort	Salim, Naomie
title	Development of compound clustering techniques using hybrid soft-computing algorithms
title_short	Development of compound clustering techniques using hybrid soft-computing algorithms
title_full	Development of compound clustering techniques using hybrid soft-computing algorithms
title_fullStr	Development of compound clustering techniques using hybrid soft-computing algorithms
title_full_unstemmed	Development of compound clustering techniques using hybrid soft-computing algorithms
title_sort	development of compound clustering techniques using hybrid soft-computing algorithms
publisher	Faculty of Computer Science and Information System
publishDate	2006
url	http://eprints.utm.my/id/eprint/4139/1/74252.pdf http://eprints.utm.my/id/eprint/4139/
_version_	1643643977229926400
spelling	my.utm.41392010-06-01T03:15:04Z http://eprints.utm.my/id/eprint/4139/ Development of compound clustering techniques using hybrid soft-computing algorithms Salim, Naomie Shamsuddin, Siti Mariyam Salleh @ Sallehuddin, Roselina Alwee, Razana T Technology (General) Databases of molecular structures available to the pharmaceutical industry comprise millions of molecules. With the advent of combinatorial chemistry, a vast number of compounds can be available either physically or virtually, which can make screening all of them infeasible in terms of time and cost. Therefore, only a subset of the entire database that encompasses the full range of structural types of the underlying dataset needs to be selected for screening to maximise the likelihood of finding as many biologically distinct active compounds as possible in a screening experiment. One of most used compound selection method is cluster-based compound selection, which involves subdividing a set of compounds into clusters and choosing one compound or a small number of compounds from each cluster. Selecting only representative compounds from each cluster is based on the assumption that structurally similar molecules have similar properties. A good clustering method groups similar compounds together, to ensure all activity classes are represented, whilst separating active and inactive compounds into different sets of clusters, to avoid an inactive compound being selected as a cluster representative. Hierarchical clustering methods such as Wardâ€™s and Group Average are considered industry standard for compound selection purposes. Previously, there is limited work on the clustering and classification of biologically active compounds into their activity based classes using fuzzy and neural network. Furthermore, it has been found that many of the biologically active molecular structures exhibit more than one activity in which case they can be used as drugs for the treatment of more than one disease. However, previous clustering methods on chemical compounds are mostly limited to hard partitioning, which allows a compound to belong to only one cluster. In this work, neural, fuzzy and hybrid methods are utilized for the clustering of biologically active molecular structures into their corresponding activity classes. The methods have been evaluated for their performance on MDLâ€™s MDDR, NCIâ€™s AIDS and IDDB drug databases containing various biologically active classes of molecular structures. The neural network methods use a number of heuristics to find appropriate parametric values. Initially, the heuristics needs user intervention to select optimal values, which give poor results. To overcome this problem, fuzzy memberships have been employed to find optimal parameters. Since fuzzy clustering methods such as the fuzzy c-means and fuzzy G â€“ K are computationally exhaustive in terms of time and memory requirements, a hierarchical approach have also been used in this work for their implementation. The hierarchical fuzzy clustering algorithm developed in this work assign the overlapping structures (structures having more than one activity) to more than one clusters if their fuzzy membership values are significantly high for those clusters. When compared with industry standard methods, the neural networks show very poor performance when 2-D bit-strings descriptors are used. However, their relative performance improves when used with topological indices as descriptors. The fuzzy and fuzzy neural methods show slightly better results than the industry standard methods. The hierarchical fuzzy clustering method developed here is far better than a similar implementation of the hard k-means method. When used for overlapping structures, its performance improves significantly. Although the neural network methods are not very effective in clustering biologically active structures, their performance is remarkable when used as classifiers. The feed forward and radial basis functions networks show higher learning capabilities than support vector machines and rough set classifier in the classification of datasets comprising more than two classes. However, their performance is slightly inferior to that of support vector machines for binary classification of chemical structures into drug and non drug compounds. Faculty of Computer Science and Information System 2006-10-31 Monograph NonPeerReviewed application/pdf en http://eprints.utm.my/id/eprint/4139/1/74252.pdf Salim, Naomie and Shamsuddin, Siti Mariyam and Salleh @ Sallehuddin, Roselina and Alwee, Razana (2006) Development of compound clustering techniques using hybrid soft-computing algorithms. Project Report. Faculty of Computer Science and Information System, Skudai, Johor. (Unpublished)
score	13.209306

Development of compound clustering techniques using hybrid soft-computing algorithms

Similar Items