uu.seUppsala University Publications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
MaRe: a MapReduce-Oriented Framework for Processing Big Data with Application Containers
2018 (English)Manuscript (preprint) (Other academic)
Place, publisher, year, edition, pages
2018.
National Category
Bioinformatics (Computational Biology)
Identifiers
URN: urn:nbn:se:uu:diva-390664OAI: oai:DiVA.org:uu-390664DiVA, id: diva2:1342439
Available from: 2019-08-13 Created: 2019-08-13 Last updated: 2019-08-22
In thesis
1. Enabling Scalable Data Analysis on Cloud Resources with Applications in Life Science
Open this publication in new window or tab >>Enabling Scalable Data Analysis on Cloud Resources with Applications in Life Science
2019 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Over the past 20 years, the rise of high-throughput methods in life science has enabled research laboratories to produce massive datasets of biological interest. When dealing with this "data deluge" of modern biology researchers encounter two major challenges: first, there is a need for substantial technical skills for dealing with Big Data and; second, infrastructure procurement becomes difficult. In connection to this second challenge, the computing model and business trend that was originally popularized by Amazon under the name of cloud computing represents an interesting opportunity. Instead of buying computing infrastructure upfront, cloud providers enable the allocation and release of virtual resources on-demand. These resources are then billed with a pay-per-use pricing model and physical infrastructure management is delegated to the provider. In this thesis, we introduce a number of methods for running Big Data analyses of biological interest using cloud computing. Considerable efforts were made in enabling the application of trusted, bioinformatics software to Big Data scenarios as opposed to reimplementing the existing codebase. Further, we improve the accessibility of the technology with the aim of reducing the entry barrier for biologists. The thesis includes 5 papers. In Papers I and II, we explore the applicability of Apache Spark, one of the leading Big Data analytics platforms in cloud environments, to two drug-discovery use cases. In Paper III, we present a general method for running bioinformatics analyses on the cloud using the microservices-oriented architecture. In Paper IV, we introduce a method that combines microservices and Apache Spark with the aim of providing the best of both technologies. In Paper V, we discuss how to reduce the entry barrier for the allocation of cloud research environments. We show that all of the developed methods scale well and we provide high-level programming interfaces for improving accessibility. We have also made the developed software publicly available.

Place, publisher, year, edition, pages
Uppsala: Acta Universitatis Upsaliensis, 2019. p. 71
Series
Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology, ISSN 1651-6214 ; 1846
Keywords
cloud computing, bioinformatics, Big Data, microservices, containers, MapReduce
National Category
Computational Mathematics
Research subject
Scientific Computing
Identifiers
urn:nbn:se:uu:diva-390666 (URN)978-91-513-0730-5 (ISBN)
Public defence
2019-10-10, B42, Uppsala Biomedicinska Centrum, Husargatan 3, Uppsala, 13:15 (English)
Opponent
Supervisors
Available from: 2019-09-17 Created: 2019-08-22 Last updated: 2019-10-15

Open Access in DiVA

No full text in DiVA

Bioinformatics (Computational Biology)

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 27 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf