On the significance of topological-indices based non-binary molecular similarity measures

This paper describes experiments to study on how well the whole range of topological indices-based non-binary similarity values represents the physicochemical similarities between compounds. Measured log P values have been compared with the log P values predicted from compounds at different range of...

Full description

Saved in:
Bibliographic Details
Main Authors: Salim, Naomie, Holliday, John, Willett, Peter
Format: Article
Published: Universiti Kebangsaan Malaysia 2004
Subjects:
Online Access:http://eprints.utm.my/id/eprint/28187/
http://www.ukm.my/jsm/english_journals/vol33num2_2004/vol33num2_04page157-172.html
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.utm.28187
record_format eprints
spelling my.utm.281872018-11-30T07:07:17Z http://eprints.utm.my/id/eprint/28187/ On the significance of topological-indices based non-binary molecular similarity measures Salim, Naomie Holliday, John Willett, Peter QA75 Electronic computers. Computer science This paper describes experiments to study on how well the whole range of topological indices-based non-binary similarity values represents the physicochemical similarities between compounds. Measured log P values have been compared with the log P values predicted from compounds at different range of similarities calculated based on various topological indices of the compounds. Analysis shows that the non-binary Cosine, Simpson and Pearson coefficients might give misleading results when certain compounds are compared. Similarity values involving 1% most similar compounds based on the non-binary Tanimoto or Euclidean coefficients has been found to be able to represent physicochemical similarities between the molecules compared. Therefore, for searches requiring around 1% most similar compounds, rational selection methods based on the non-binary Tanimoto or Euclidean coefficients are likely to produce better results than random selection. Similarity values involving 5% most dissimilar compounds based on the non-binary Tanimoto coefficients has also been found to be able to represent physicochemical dissimilarities between the molecules compared. Therefore, for diverse selection requiring less than 5% most dissimilar compounds, rational selection methods based on the non-binary Tanimoto coefficient is likely to produce better results than random selection. However, in both focused and diverse selection using the coefficients mentioned, as more and more compounds are selected, the selection becomes more and more like random selection in terms of physicochemical properties similarity and dissimilarity. Universiti Kebangsaan Malaysia 2004-12 Article PeerReviewed Salim, Naomie and Holliday, John and Willett, Peter (2004) On the significance of topological-indices based non-binary molecular similarity measures. Sains Malaysiana, 33 (2). pp. 157-172. ISSN 0126-6039 http://www.ukm.my/jsm/english_journals/vol33num2_2004/vol33num2_04page157-172.html
institution Universiti Teknologi Malaysia
building UTM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Teknologi Malaysia
content_source UTM Institutional Repository
url_provider http://eprints.utm.my/
topic QA75 Electronic computers. Computer science
spellingShingle QA75 Electronic computers. Computer science
Salim, Naomie
Holliday, John
Willett, Peter
On the significance of topological-indices based non-binary molecular similarity measures
description This paper describes experiments to study on how well the whole range of topological indices-based non-binary similarity values represents the physicochemical similarities between compounds. Measured log P values have been compared with the log P values predicted from compounds at different range of similarities calculated based on various topological indices of the compounds. Analysis shows that the non-binary Cosine, Simpson and Pearson coefficients might give misleading results when certain compounds are compared. Similarity values involving 1% most similar compounds based on the non-binary Tanimoto or Euclidean coefficients has been found to be able to represent physicochemical similarities between the molecules compared. Therefore, for searches requiring around 1% most similar compounds, rational selection methods based on the non-binary Tanimoto or Euclidean coefficients are likely to produce better results than random selection. Similarity values involving 5% most dissimilar compounds based on the non-binary Tanimoto coefficients has also been found to be able to represent physicochemical dissimilarities between the molecules compared. Therefore, for diverse selection requiring less than 5% most dissimilar compounds, rational selection methods based on the non-binary Tanimoto coefficient is likely to produce better results than random selection. However, in both focused and diverse selection using the coefficients mentioned, as more and more compounds are selected, the selection becomes more and more like random selection in terms of physicochemical properties similarity and dissimilarity.
format Article
author Salim, Naomie
Holliday, John
Willett, Peter
author_facet Salim, Naomie
Holliday, John
Willett, Peter
author_sort Salim, Naomie
title On the significance of topological-indices based non-binary molecular similarity measures
title_short On the significance of topological-indices based non-binary molecular similarity measures
title_full On the significance of topological-indices based non-binary molecular similarity measures
title_fullStr On the significance of topological-indices based non-binary molecular similarity measures
title_full_unstemmed On the significance of topological-indices based non-binary molecular similarity measures
title_sort on the significance of topological-indices based non-binary molecular similarity measures
publisher Universiti Kebangsaan Malaysia
publishDate 2004
url http://eprints.utm.my/id/eprint/28187/
http://www.ukm.my/jsm/english_journals/vol33num2_2004/vol33num2_04page157-172.html
_version_ 1643648003259498496
score 13.160551