uu.seUppsala University Publications
Change search
Link to record
Permanent link

Direct link
BETA
Eklund, Martin
Publications (10 of 21) Show all publications
Torabi Moghadam, B., Alvarsson, J., Holm, M., Eklund, M., Carlsson, L. & Spjuth, O. (2015). Scaling predictive modeling in drug development with cloud computing. Journal of Chemical Information and Modeling, 55, 19-25
Open this publication in new window or tab >>Scaling predictive modeling in drug development with cloud computing
Show others...
2015 (English)In: Journal of Chemical Information and Modeling, ISSN 1549-9596, Vol. 55, p. 19-25Article in journal (Refereed) Published
National Category
Bioinformatics (Computational Biology)
Identifiers
urn:nbn:se:uu:diva-242914 (URN)10.1021/ci500580y (DOI)000348619400002 ()25493610 (PubMedID)
Projects
eSSENCE
Funder
Swedish Research Council, VR-2011-6129
Available from: 2014-12-10 Created: 2015-02-02 Last updated: 2018-01-11Bibliographically approved
Eklund, M., Norinder, U., Boyer, S. & Carlsson, L. (2015). The application of conformal prediction to the drug discovery process. Annals of Mathematics and Artificial Intelligence, 74(1-2), 117-132
Open this publication in new window or tab >>The application of conformal prediction to the drug discovery process
2015 (English)In: Annals of Mathematics and Artificial Intelligence, ISSN 1012-2443, E-ISSN 1573-7470, Vol. 74, no 1-2, p. 117-132Article in journal (Refereed) Published
Abstract [en]

QSAR modeling is a method for predicting properties, e.g. the solubility or toxicity, of chemical compounds using machine learning techniques. QSAR is in widespread use within the pharmaceutical industry to prioritize compounds for experimental testing or to alert for potential toxicity during the drug discovery process. However, the confidence or reliability of predictions from a QSAR model are difficult to accurately assess. We frame the application of QSAR to preclinical drug development in an off-line inductive conformal prediction framework and apply it prospectively to historical data collected from four different assays within AstraZeneca over a time course of about five years. The results indicate weakened validity of the conformal predictor due to violations of the randomness assumption. The validity can be strengthen by adopting semi-off-line conformal prediction. The non-randomness of the data prevents exactly valid predictions but comparisons to the results of a traditional QSAR procedure applied to the same data indicate that conformal predictions are highly useful in the drug discovery process.

Keyword
QSAR, Conformal prediction, Drug discovery, Temporal model updating
National Category
Pharmaceutical Sciences Pharmacology and Toxicology
Identifiers
urn:nbn:se:uu:diva-257004 (URN)10.1007/s10472-013-9378-2 (DOI)000355747600007 ()
Available from: 2015-06-30 Created: 2015-06-29 Last updated: 2018-01-11Bibliographically approved
Alvarsson, J., Eklund, M., Andersson, C., Carlsson, L., Spjuth, O. & Wikberg, J. E. S. (2014). Benchmarking Study of Parameter Variation When Using Signature Fingerprints Together with Support Vector Machines. Journal of Chemical Information and Modeling, 54(11), 3211-3217
Open this publication in new window or tab >>Benchmarking Study of Parameter Variation When Using Signature Fingerprints Together with Support Vector Machines
Show others...
2014 (English)In: Journal of Chemical Information and Modeling, ISSN 1549-9596, Vol. 54, no 11, p. 3211-3217Article in journal (Refereed) Published
Abstract [en]

QSAR modeling using molecular signatures and support vector machines with a radial basis function is increasingly used for virtual screening in the drug discovery field. This method has three free parameters: C, ?, and signature height. C is a penalty parameter that limits overfitting, ? controls the width of the radial basis function kernel, and the signature height determines how much of the molecule is described by each atom signature. Determination of optimal values for these parameters is time-consuming. Good default values could therefore save considerable computational cost. The goal of this project was to investigate whether such default values could be found by using seven public QSAR data sets spanning a wide range of end points and using both a bit version and a count version of the molecular signatures. On the basis of the experiments performed, we recommend a parameter set of heights 0 to 2 for the count version of the signature fingerprints and heights 0 to 3 for the bit version. These are in combination with a support vector machine using C in the range of 1 to 100 and gamma in the range of 0.001 to 0.1. When data sets are small or longer run times are not a problem, then there is reason to consider the addition of height 3 to the count fingerprint and a wider grid search. However, marked improvements should not be expected.

National Category
Medical Biotechnology Pharmaceutical Sciences
Identifiers
urn:nbn:se:uu:diva-240239 (URN)10.1021/ci500344v (DOI)000345551000017 ()25318024 (PubMedID)
Funder
eSSENCE - An eScience CollaborationScience for Life Laboratory - a national resource center for high-throughput molecular bioscience
Available from: 2015-01-07 Created: 2015-01-06 Last updated: 2018-01-11Bibliographically approved
Eklund, M., Norinder, U., Boyer, S. & Carlsson, L. (2014). Choosing Feature Selection and Learning Algorithms in QSAR. J CHEM INF MODEL, 54(3), 837-843
Open this publication in new window or tab >>Choosing Feature Selection and Learning Algorithms in QSAR
2014 (English)In: J CHEM INF MODEL, ISSN 1549-9596, Vol. 54, no 3, p. 837-843Article in journal (Refereed) Published
Abstract [en]

Feature selection is an important part of contemporary QSAR analysis. In a recently published paper, we investigated the performance of different feature selection methods in a large number of in silico experiments conducted using real QSAR datasets. However, an interesting question that we did not address is whether certain feature selection methods are better than others in combination with certain learning methods, in terms of producing models with high prediction accuracy. In this report we extend our work from the previous investigation by using four different feature selection methods (wrapper, ReliefF, MARS, and elastic nets), together with eight learners (MARS, elastic net, random forest, SVM, neural networks, multiple linear regression, PLS, kNN) in an empirical investigation to address this question. The results indicate that state-of-the-art learners (random forest, SVM, and neural networks) do not gain prediction accuracy from feature selection, and we found no evidence that a certain feature selection is particularly well-suited for use in combination with a certain learner.

National Category
Medical and Health Sciences
Identifiers
urn:nbn:se:uu:diva-224477 (URN)10.1021/ci400573c (DOI)000333478800015 ()
Available from: 2014-05-15 Created: 2014-05-13 Last updated: 2014-05-15Bibliographically approved
Alvarsson, J., Eklund, M., Engkvist, O., Spjuth, O., Carlsson, L., Wikberg, J. E. S. & Noeske, T. (2014). Ligand-Based Target Prediction with Signature Fingerprints. Journal of Chemical Information and Modeling, 54(10), 2647-2653
Open this publication in new window or tab >>Ligand-Based Target Prediction with Signature Fingerprints
Show others...
2014 (English)In: Journal of Chemical Information and Modeling, ISSN 1549-9596, Vol. 54, no 10, p. 2647-2653Article in journal (Refereed) Published
Abstract [en]

When evaluating a potential drug candidate it is desirable to predict target interactions in silico prior to synthesis in order to assess, e.g., secondary pharmacology. This can be done by looking at known target binding profiles of similar compounds using chemical similarity searching. The purpose of this study was to construct and evaluate the performance of chemical fingerprints based on the molecular signature descriptor for performing target binding predictions. For the comparison we used the area under the receiver operating characteristics curve (AUC) complemented with net reclassification improvement (NRI). We created two open source signature fingerprints, a bit and a count version, and evaluated their performance compared to a set of established fingerprints with regards to predictions of binding targets using Tanimoto-based similarity searching on publicly available data sets extracted from ChEMBL. The results showed that the count version of the signature fingerprint performed on par with well-established fingerprints such as ECFP. The count version outperformed the bit version slightly; however, the count version is more complex and takes more computing time and memory to run so its usage should probably be evaluated on a case-by-case basis. The NRI based tests complemented the AUC based ones and showed signs of higher power.

National Category
Pharmaceutical Sciences Bioinformatics (Computational Biology)
Research subject
Bioinformatics
Identifiers
urn:nbn:se:uu:diva-237934 (URN)10.1021/ci500361u (DOI)000343849600004 ()25230336 (PubMedID)
Funder
Swedish Research Council, VR-2011-6129eSSENCE - An eScience CollaborationScience for Life Laboratory - a national resource center for high-throughput molecular bioscienceSwedish National Infrastructure for Computing (SNIC)
Available from: 2014-12-08 Created: 2014-12-08 Last updated: 2018-01-11Bibliographically approved
Spjuth, O., Georgiev, V., Carlsson, L., Alvarsson, J., Berg, A., Willighagen, E., . . . Eklund, M. (2013). Bioclipse-R: Integrating management and visualization of life science data with statistical analysis. Bioinformatics, 29(2), 286-289
Open this publication in new window or tab >>Bioclipse-R: Integrating management and visualization of life science data with statistical analysis
Show others...
2013 (English)In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 29, no 2, p. 286-289Article in journal (Refereed) Published
Abstract [en]

Bioclipse, a graphical workbench for the life sciences, provides functionality for managing and visualizing life science data. We introduce Bioclipse-R, which integrates Bioclipse and the statistical programming language R. The synergy between Bioclipse and R is demonstrated by the construction of a decision support system for anticancer drug screening and mutagenicity prediction, which shows how Bioclipse-R can be used to perform complex tasks from within a single software system.

Place, publisher, year, edition, pages
Oxford University Press, 2013
Keyword
bioclipse, bioinformatics, data analysis
National Category
Bioinformatics and Systems Biology Bioinformatics (Computational Biology)
Research subject
Bioinformatics
Identifiers
urn:nbn:se:uu:diva-187432 (URN)10.1093/bioinformatics/bts681 (DOI)000313722800026 ()23178637 (PubMedID)
Funder
eSSENCE - An eScience CollaborationSwedish Research Council, 2011-6129
Available from: 2012-12-06 Created: 2012-12-06 Last updated: 2018-01-12Bibliographically approved
Eklund, M., Norinder, U., Boyer, S. & Carlsson, L. (2012). Application of conformal prediction in QSAR. In: Lazaros Iliadis, Ilias Maglogiannis, Harris Papadopoulos, Kostas Karatzas, Spyros Sioutas (Ed.), Artificial Intelligence Applications and Innovations: AIAI 2012 International Workshops: AIAB, AIeIA, CISE, COPA, IIVC, ISQL, MHDW, and WADTMB, Halkidiki, Greece, September 27-30, 2012, Proceedings, Part II. Paper presented at 8th International Workshop on Artificial Intelligence Applications and Innovations, AIAI 2012: AIAB, AIeIA, CISE, COPA, IIVC, ISQL, MHDW, and WADTMB, 27 September 2012 through 30 September 2012, Halkidiki (pp. 166-175). (PART 2)
Open this publication in new window or tab >>Application of conformal prediction in QSAR
2012 (English)In: Artificial Intelligence Applications and Innovations: AIAI 2012 International Workshops: AIAB, AIeIA, CISE, COPA, IIVC, ISQL, MHDW, and WADTMB, Halkidiki, Greece, September 27-30, 2012, Proceedings, Part II / [ed] Lazaros Iliadis, Ilias Maglogiannis, Harris Papadopoulos, Kostas Karatzas, Spyros Sioutas, 2012, no PART 2, p. 166-175Conference paper, Published paper (Refereed)
Abstract [en]

QSAR modeling is a method for predicting properties, e.g. the solubility or toxicity, of chemical compounds using statistical learning techniques. QSAR is in widespread use within the pharmaceutical industry to prioritize compounds for experimental testing or to alert for potential toxicity. However, predictions from a QSAR model are difficult to assess if their prediction intervals are unknown. In this paper we introduce conformal prediction into the QSAR field to address this issue. We apply support vector machine regression in combination with two nonconformity measures to five datasets of different sizes to demonstrate the usefulness of conformal prediction in QSAR modeling. One of the nonconformity measures provides prediction intervals with almost the same width as the size of the QSAR models' prediction errors, showing that the prediction intervals obtained by conformal prediction are efficient and useful.

Series
IFIP Advances in Information and Communication Technology, ISSN 1868-4238 ; 382 AICT
National Category
Medical and Health Sciences
Identifiers
urn:nbn:se:uu:diva-193891 (URN)10.1007/978-3-642-33412-2_17 (DOI)9783642334115 (ISBN)
Conference
8th International Workshop on Artificial Intelligence Applications and Innovations, AIAI 2012: AIAB, AIeIA, CISE, COPA, IIVC, ISQL, MHDW, and WADTMB, 27 September 2012 through 30 September 2012, Halkidiki
Available from: 2013-02-12 Created: 2013-02-06 Last updated: 2013-02-12Bibliographically approved
Eklund, M., Norinder, U., Boyer, S. & Carlsson, L. (2012). Benchmarking Variable Selection in QSAR. Molecular Informatics, 31(2), 173-179
Open this publication in new window or tab >>Benchmarking Variable Selection in QSAR
2012 (English)In: Molecular Informatics, ISSN 1868-1743, Vol. 31, no 2, p. 173-179Article in journal (Refereed) Published
Abstract [en]

Variable selection is important in QSAR modeling since it can improve model performance and transparency, as well as reduce the computational cost of model fitting and predictions. Which variable selection methods that perform well in QSAR settings is largely unknown. To address this question we, in a total of 1728 benchmarking experiments, rigorously investigated how eight variable selection methods affect the predictive performance and transparency of random forest models fitted to seven QSAR datasets covering different endpoints, descriptors sets, types of response variables, and number of chemical compounds. The results show that univariate variable selection methods are suboptimal and that the number of variables in the benchmarked datasets can be reduced with about 60?% without significant loss in model performance when using multivariate adaptive regression splines MARS and forward selection.

Keyword
Variable selection, Benchmarking, Optimization, Model performance
National Category
Pharmaceutical Sciences
Identifiers
urn:nbn:se:uu:diva-171437 (URN)10.1002/minf.201100142 (DOI)000300675200007 ()
Available from: 2012-03-19 Created: 2012-03-19 Last updated: 2018-01-12Bibliographically approved
Carlsson, L., Spjuth, O., Eklund, M. & Boyer, S. (2012). Model building in Bioclipse Decision Support applied to open datasets. Paper presented at The 48th Congress of the European Societies of Toxicology, EUROTOX, 17-20 June, 2012, Stockholm, Sweden. Toxicology Letters, 211(Suppl.), S62
Open this publication in new window or tab >>Model building in Bioclipse Decision Support applied to open datasets
2012 (English)In: Toxicology Letters, ISSN 0378-4274, E-ISSN 1879-3169, Vol. 211, no Suppl., p. S62-Article in journal, Meeting abstract (Refereed) Published
Abstract [en]

Bioclipse Decision Support (DS) is a system capable of building predictive models of any collection of SAR data, and making them available in a simple user interface based on Bioclipse (www.bioclipse.net).

The method is fast and uses Faulon Signatures as chemical descriptors together with a Support Vector Machine algorithm for QSAR model building. A key feature is the capability to visualize and interpret results by highlighting the substructures which contributed most to the prediction. This, together with very fast predictions, allows for editing chemical structures with instantly updated results.

We here present the results from applying Bioclipse Decision Support to several open QSAR data sets, including endpoints from OpenTox and PubChem. The results show how to extract data from the sources and to build models which can be integrated with user specific models.

National Category
Bioinformatics and Systems Biology
Research subject
Bioinformatics
Identifiers
urn:nbn:se:uu:diva-197845 (URN)10.1016/j.toxlet.2012.03.243 (DOI)
Conference
The 48th Congress of the European Societies of Toxicology, EUROTOX, 17-20 June, 2012, Stockholm, Sweden
Available from: 2013-04-04 Created: 2013-04-04 Last updated: 2017-12-06Bibliographically approved
Spjuth, O., Carlsson, L., Alvarsson, J., Georgiev, V., Willighagen, E. & Eklund, M. (2012). Open source drug discovery with Bioclipse. Current Topics in Medicinal Chemistry, 12(18), 1980-1986
Open this publication in new window or tab >>Open source drug discovery with Bioclipse
Show others...
2012 (English)In: Current Topics in Medicinal Chemistry, ISSN 1568-0266, E-ISSN 1873-4294, Vol. 12, no 18, p. 1980-1986Article, review/survey (Refereed) Published
Abstract [en]

We present the open source components for drug discovery that has been developed and integrated into the graphical workbench Bioclipse. Building on a solid open source cheminformatics core, Bioclipse has advanced functionality for managing and visualizing chemical structures and related information. The features presented here include QSAR/QSPR modeling, various predictive solutions such as decision support for chemical liability assessment, site-of-metabolism prediction, virtual screening, and knowledge discovery and integration. We demonstrate the utility of the described tools with examples from computational pharmacology, toxicology, and ADME. Bioclipse is used in both academia and industry, and is a good example of open source leading to new solutions for drug discovery.

National Category
Bioinformatics and Systems Biology
Research subject
Bioinformatics; Pharmaceutical Science
Identifiers
urn:nbn:se:uu:diva-183846 (URN)10.2174/1568026611212180005 (DOI)000313430900005 ()23110533 (PubMedID)
Available from: 2012-11-04 Created: 2012-11-04 Last updated: 2017-12-07Bibliographically approved
Organisations

Search in DiVA

Show all publications