uu.seUppsala University Publications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Efficient iterative virtual screening with Apache Spark and conformal prediction.
Royal Institute of Technology, KTH.
Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences. (Spjuth)ORCID iD: 0000-0002-6064-2684
Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computational Science. (Spjuth)ORCID iD: 0000-0002-4851-759X
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computational Science.
Show others and affiliations
2018 (English)In: Journal of Cheminformatics, ISSN 1758-2946, E-ISSN 1758-2946, Vol. 10, article id 8Article in journal (Refereed) Published
Abstract [en]

BACKGROUND: Docking and scoring large libraries of ligands against target proteins forms the basis of structure-based virtual screening. The problem is trivially parallelizable, and calculations are generally carried out on computer clusters or on large workstations in a brute force manner, by docking and scoring all available ligands.

CONTRIBUTION: In this study we propose a strategy that is based on iteratively docking a set of ligands to form a training set, training a ligand-based model on this set, and predicting the remainder of the ligands to exclude those predicted as 'low-scoring' ligands. Then, another set of ligands are docked, the model is retrained and the process is repeated until a certain model efficiency level is reached. Thereafter, the remaining ligands are docked or excluded based on this model. We use SVM and conformal prediction to deliver valid prediction intervals for ranking the predicted ligands, and Apache Spark to parallelize both the docking and the modeling.

RESULTS: We show on 4 different targets that conformal prediction based virtual screening (CPVS) is able to reduce the number of docked molecules by 62.61% while retaining an accuracy for the top 30 hits of 94% on average and a speedup of 3.7. The implementation is available as open source via GitHub ( https://github.com/laeeq80/spark-cpvs ) and can be run on high-performance computers as well as on cloud resources.

Place, publisher, year, edition, pages
2018. Vol. 10, article id 8
Keywords [en]
Apache Spark, Cloud computing, Conformal prediction, Docking, Virtual screening
National Category
Bioinformatics (Computational Biology)
Research subject
Bioinformatics
Identifiers
URN: urn:nbn:se:uu:diva-343980DOI: 10.1186/s13321-018-0265-zISI: 000426699400001PubMedID: 29492726OAI: oai:DiVA.org:uu-343980DiVA, id: diva2:1187365
Funder
eSSENCE - An eScience CollaborationSwedish e‐Science Research CenterSwedish National Infrastructure for Computing (SNIC), b2015245Swedish National Infrastructure for Computing (SNIC), SNIC 2017/13-6Available from: 2018-03-03 Created: 2018-03-03 Last updated: 2018-05-14Bibliographically approved

Open Access in DiVA

fulltext(1104 kB)6 downloads
File information
File name FULLTEXT01.pdfFile size 1104 kBChecksum SHA-512
17ed0d0e0e28ab6bfe64d64973eb7b9acff58c775634fd0e7a68ae82793928aadf03cd458a868e3ca82fdfd8e2273234fdecf36bf611861c3c2ddf2a776efd9b
Type fulltextMimetype application/pdf

Other links

Publisher's full textPubMed

Authority records BETA

Georgiev, ValentinCapuccini, MarcoToor, SalmanSchaal, WesleySpjuth, Ola

Search in DiVA

By author/editor
Georgiev, ValentinCapuccini, MarcoToor, SalmanSchaal, WesleySpjuth, Ola
By organisation
Department of Pharmaceutical BiosciencesComputational Science
In the same journal
Journal of Cheminformatics
Bioinformatics (Computational Biology)

Search outside of DiVA

GoogleGoogle Scholar
Total: 6 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
pubmed
urn-nbn

Altmetric score

doi
pubmed
urn-nbn
Total: 100 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf