uu.seUppsala University Publications
Change search
ReferencesLink to record
Permanent link

Direct link
Benchmarking Study of Parameter Variation When Using Signature Fingerprints Together with Support Vector Machines
Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Sciences, Cancer Pharmacology and Computational Medicine.
AstraZeneca R&D.
Show others and affiliations
2014 (English)In: Journal of Chemical Information and Modeling, ISSN 1549-9596, Vol. 54, no 11, 3211-3217 p.Article in journal (Refereed) Published
Abstract [en]

QSAR modeling using molecular signatures and support vector machines with a radial basis function is increasingly used for virtual screening in the drug discovery field. This method has three free parameters: C, ?, and signature height. C is a penalty parameter that limits overfitting, ? controls the width of the radial basis function kernel, and the signature height determines how much of the molecule is described by each atom signature. Determination of optimal values for these parameters is time-consuming. Good default values could therefore save considerable computational cost. The goal of this project was to investigate whether such default values could be found by using seven public QSAR data sets spanning a wide range of end points and using both a bit version and a count version of the molecular signatures. On the basis of the experiments performed, we recommend a parameter set of heights 0 to 2 for the count version of the signature fingerprints and heights 0 to 3 for the bit version. These are in combination with a support vector machine using C in the range of 1 to 100 and gamma in the range of 0.001 to 0.1. When data sets are small or longer run times are not a problem, then there is reason to consider the addition of height 3 to the count fingerprint and a wider grid search. However, marked improvements should not be expected.

Place, publisher, year, edition, pages
2014. Vol. 54, no 11, 3211-3217 p.
National Category
Medical Biotechnology Pharmaceutical Sciences
URN: urn:nbn:se:uu:diva-240239DOI: 10.1021/ci500344vISI: 000345551000017PubMedID: 25318024OAI: oai:DiVA.org:uu-240239DiVA: diva2:776257
eSSENCE - An eScience CollaborationScience for Life Laboratory - a national resource center for high-throughput molecular bioscience
Available from: 2015-01-07 Created: 2015-01-06 Last updated: 2015-05-12Bibliographically approved
In thesis
1. Ligand-based Methods for Data Management and Modelling
Open this publication in new window or tab >>Ligand-based Methods for Data Management and Modelling
2015 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Drug discovery is a complicated and expensive process in the billion dollar range. One way of making the drug development process more efficient is better information handling, modelling and visualisation. The majority of todays drugs are small molecules, which interact with drug targets to cause an effect. Since the 1980s large amounts of compounds have been systematically tested by robots in so called high-throughput screening. Ligand-based drug discovery is based on modelling drug molecules. In the field known as Quantitative Structure–Activity Relationship (QSAR) molecules are described by molecular descriptors which are used for building mathematical models. Based on these models molecular properties can be predicted and using the molecular descriptors molecules can be compared for, e.g., similarity. Bioclipse is a workbench for the life sciences which provides ligand-based tools through a point and click interface. 

The aims of this thesis were to research, and develop new or improved ligand-based methods and open source software, and to work towards making these tools available for users through the Bioclipse workbench. To this end, a series of molecular signature studies was done and various Bioclipse plugins were developed.

An introduction to the field is provided in the thesis summary which is followed by five research papers. Paper I describes the Bioclipse 2 software and the Bioclipse scripting language. In Paper II the laboratory information system Brunn for supporting work with dose-response studies on microtiter plates is described. In Paper III the creation of a molecular fingerprint based on the molecular signature descriptor is presented and the new fingerprints are evaluated for target prediction and found to perform on par with industrial standard commercial molecular fingerprints. In Paper IV the effect of different parameter choices when using the signature fingerprint together with support vector machines (SVM) using the radial basis function (RBF) kernel is explored and reasonable default values are found. In Paper V the performance of SVM based QSAR using large datasets with the molecular signature descriptor is studied, and a QSAR model based on 1.2 million substances is created and made available from the Bioclipse workbench.

Place, publisher, year, edition, pages
Uppsala: Acta Universitatis Upsaliensis, 2015. 73 p.
Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Pharmacy, ISSN 1651-6192 ; 200
QSAR, ligand-based drug discovery, bioclipse, information system, cheminformatics, bioinformatics
National Category
Pharmaceutical Sciences Bioinformatics and Systems Biology
Research subject
Pharmaceutical Pharmacology; Bioinformatics
urn:nbn:se:uu:diva-248964 (URN)978-91-554-9237-3 (ISBN)
Public defence
2015-06-05, B22 BMC, Husargatan 3, Uppsala, 09:15 (English)
Available from: 2015-05-12 Created: 2015-04-09 Last updated: 2015-07-07

Open Access in DiVA

No full text

Other links

Publisher's full textPubMed

Search in DiVA

By author/editor
Alvarsson, JonathanEklund, MartinAndersson, ClaesSpjuth, OlaWikberg, Jarl E. S.
By organisation
Department of Pharmaceutical BiosciencesCancer Pharmacology and Computational MedicineScience for Life Laboratory, SciLifeLab
Medical BiotechnologyPharmaceutical Sciences

Search outside of DiVA

GoogleGoogle Scholar

Altmetric score

Total: 364 hits
ReferencesLink to record
Permanent link

Direct link