Logo: to the web site of Uppsala University

uu.sePublications from Uppsala University
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Exploring classical machine learning algorithms and molecular fingerprint parameters for ligand-based modelling
Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
2018 (English)Independent thesis Advanced level (degree of Master (Two Years)), 80 credits / 120 HE creditsStudent thesis
Abstract [en]

AbstractFor many years molecular fingerprints and machine learning algorithms havebeen widely adapted and applied in quantitative structure-activity relationship(QSAR) modelling. In this study, the focus was on ligand-based modellingwhere classical machine learning techniques such as Random Forest, C-SupportVector Classifier (C-SVC) and Neural Network were utilized on datasets ob-tained from biological assays with its compounds represented as Morgan finger-prints. More specifically, predictive performance and computational cost of themachine learning algorithms, as well as the Morgan algorithm were evaluated.Results illustrated that there weren’t clear differences in terms of predictiveperformance between these classical algorithms. However, increasing hash sizesof the Morgan algorithm had a strong positive effect on predictive performancein every case, with the unhashed version outperforming the hashed versions. In-creased fingerprint radius had slight negative trends on ROC-AUC scores withRandom Forest and slight positive trends with C-SVC and Neural Network.The FEST implementation of Random Forest and C-SVC were highly efficientwith regards to memory usage and runtime respectively. Sckit-learn’s RandomForest Classifier (RFC) showed great robustness in predictive performance —where Neural Network, FEST and particularly C-SVC were more sensitive tohyperparameter settings. Considering feasibility, the Random Forest would bea valid initial baseline model to implement — and the superior predictive per-formance of unhashed Morgan fingerprints, suggest that hashing (compression)of the fingerprints should be avoided

Place, publisher, year, edition, pages
2018. , p. 23
National Category
Other Medical Sciences
Identifiers
URN: urn:nbn:se:uu:diva-439541OAI: oai:DiVA.org:uu-439541DiVA, id: diva2:1542313
Supervisors
Examiners
Available from: 2023-02-13 Created: 2021-04-07 Last updated: 2023-02-13Bibliographically approved

Open Access in DiVA

No full text in DiVA

By organisation
Department of Pharmaceutical Biosciences
Other Medical Sciences

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 21 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf