uu.seUppsala University Publications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Conformal prediction in Spark: Large-scale machine learning with confidence
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Scientific Computing. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computational Science.ORCID iD: 0000-0002-4851-759X
Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences. Uppsala University, Science for Life Laboratory, SciLifeLab.ORCID iD: 0000-0002-8083-2864
2015 (English)In: Proc. 2nd International Symposium on Big Data Computing, Los Alamitos, CA: IEEE Computer Society, 2015, p. 61-67Conference paper, Published paper (Refereed)
Place, publisher, year, edition, pages
Los Alamitos, CA: IEEE Computer Society, 2015. p. 61-67
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:uu:diva-283636DOI: 10.1109/BDC.2015.35ISI: 000380459200007ISBN: 978-0-7695-5696-3 (print)OAI: oai:DiVA.org:uu-283636DiVA, id: diva2:919450
Conference
BDC 2015, December 7–10, Limassol, Cyprus
Projects
eSSENCEAvailable from: 2015-12-10 Created: 2016-04-13 Last updated: 2019-08-22Bibliographically approved
In thesis
1. Enabling Scalable Data Analysis on Cloud Resources with Applications in Life Science
Open this publication in new window or tab >>Enabling Scalable Data Analysis on Cloud Resources with Applications in Life Science
2019 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Over the past 20 years, the rise of high-throughput methods in life science has enabled research laboratories to produce massive datasets of biological interest. When dealing with this "data deluge" of modern biology researchers encounter two major challenges: first, there is a need for substantial technical skills for dealing with Big Data and; second, infrastructure procurement becomes difficult. In connection to this second challenge, the computing model and business trend that was originally popularized by Amazon under the name of cloud computing represents an interesting opportunity. Instead of buying computing infrastructure upfront, cloud providers enable the allocation and release of virtual resources on-demand. These resources are then billed with a pay-per-use pricing model and physical infrastructure management is delegated to the provider. In this thesis, we introduce a number of methods for running Big Data analyses of biological interest using cloud computing. Considerable efforts were made in enabling the application of trusted, bioinformatics software to Big Data scenarios as opposed to reimplementing the existing codebase. Further, we improve the accessibility of the technology with the aim of reducing the entry barrier for biologists. The thesis includes 5 papers. In Papers I and II, we explore the applicability of Apache Spark, one of the leading Big Data analytics platforms in cloud environments, to two drug-discovery use cases. In Paper III, we present a general method for running bioinformatics analyses on the cloud using the microservices-oriented architecture. In Paper IV, we introduce a method that combines microservices and Apache Spark with the aim of providing the best of both technologies. In Paper V, we discuss how to reduce the entry barrier for the allocation of cloud research environments. We show that all of the developed methods scale well and we provide high-level programming interfaces for improving accessibility. We have also made the developed software publicly available.

Place, publisher, year, edition, pages
Uppsala: Acta Universitatis Upsaliensis, 2019. p. 71
Series
Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology, ISSN 1651-6214 ; 1846
Keywords
cloud computing, bioinformatics, Big Data, microservices, containers, MapReduce
National Category
Computational Mathematics
Research subject
Scientific Computing
Identifiers
urn:nbn:se:uu:diva-390666 (URN)978-91-513-0730-5 (ISBN)
Public defence
2019-10-10, B42, Uppsala Biomedicinska Centrum, Husargatan 3, Uppsala, 13:15 (English)
Opponent
Supervisors
Available from: 2019-09-17 Created: 2019-08-22 Last updated: 2019-10-15

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full text

Authority records BETA

Capuccini, MarcoSpjuth, Ola

Search in DiVA

By author/editor
Capuccini, MarcoSpjuth, Ola
By organisation
Division of Scientific ComputingComputational ScienceDepartment of Pharmaceutical BiosciencesScience for Life Laboratory, SciLifeLab
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 505 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf