Using Iterative MapReduce for Parallel Virtual Screening
(English)In: Journal of medical and bioengineering, ISSN 2301-3796Article in journal (Refereed) Accepted
MapReduce and its different implementations has been successfully used on commodity clusters for analysis of data for problems where the datasets becomes really huge. Virtual Screening is a technique in chemoinformatics used for Drug discovery by searching large libraries of molecule structures, making it a great candidate for MapReduce. However, in this study we used SVM based virtual screening which is resource demanding. Such virtual screening not only have huge datasets, but it is also compute expensive whose complexity can grow at least upto n2. Most SVM based applications use MPI, but MPI has its own limitations such as lack of fault tolerance and low productivity. This study shows that MapReduce can be used effectively for implementing SVM based virtual screening. The results illustrate that MapReduce performs quite well with the increasing nodes on the cluster. For experiments, we have used spark, an iterative MapReduce programming model. We have also provided the flow of program and the results to show the efficiency of iterative MapReduce.
Place, publisher, year, edition, pages
Engineering and Technology Publishing.
spark, mapreduce, virtual screening
Bioinformatics (Computational Biology) Computer Science
Research subject Bioinformatics; Computer Science
IdentifiersURN: urn:nbn:se:uu:diva-213044OAI: oai:DiVA.org:uu-213044DiVA: diva2:680364