uu.seUppsala University Publications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
BiobankCloud: A Platform for the Secure Storage, Sharing, and Processing of Large Biomedical Data Sets
Univ Lisbon, Fac Ciencias, LaSIGE, Lisbon, Portugal..
Humboldt Univ, Berlin, Germany..
Humboldt Univ, Berlin, Germany..
Univ Lisbon, Fac Ciencias, LaSIGE, Lisbon, Portugal..
Show others and affiliations
2016 (English)In: BIOMEDICAL DATA MANAGEMENT AND GRAPH ONLINE QUERYING, 2016, p. 89-105Conference paper, Published paper (Refereed)
Abstract [en]

Biobanks store and catalog human biological material that is increasingly being digitized using next-generation sequencing (NGS). There is, however, a computational bottleneck, as existing software systems are not scalable and secure enough to store and process the incoming wave of genomic data from NGS machines. In the BiobankCloud project, we are building a Hadoop-based platform for the secure storage, sharing, and parallel processing of genomic data. We extended Hadoop to include support for multi-tenant studies, reduced storage requirements with erasure coding, and added support for extensible and consistent metadata. On top of Hadoop, we built a scalable scientific workflow engine featuring a proper workflow definition language focusing on simple integration and chaining of existing tools, adaptive scheduling on Apache Yarn, and support for iterative dataflows. Our platform also supports the secure sharing of data across different, distributed Hadoop clusters. The software is easily installed and comes with a user-friendly web interface for running, managing, and accessing data sets behind a secure 2-factor authentication. Initial tests have shown that the engine scales well to dozens of nodes. The entire system is open-source and includes pre-defined workflows for popular tasks in biomedical data analysis, such as variant identification, differential transcriptome analysis using RNA-Seq, and analysis of miRNA-Seq and ChIP-Seq data.

Place, publisher, year, edition, pages
2016. p. 89-105
Series
Lecture Notes in Computer Science, ISSN 0302-9743 ; 9579
National Category
Medical Ethics
Identifiers
URN: urn:nbn:se:uu:diva-311409DOI: 10.1007/978-3-319-41576-5_7ISI: 000387957300007ISBN: 9783319415765; 9783319415758 (print)OAI: oai:DiVA.org:uu-311409DiVA, id: diva2:1060025
Conference
1st International Workshop on Data Management and Analytics for Medicine and Healthcare (DMAH) / Workshop on Big-Graphs Online Querying (Big-O (Q)), AUG 31-SEP 04, 2015, Waikoloa, HI
Available from: 2016-12-27 Created: 2016-12-27 Last updated: 2018-05-16Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full text

Authority records BETA

Reichel, Jane

Search in DiVA

By author/editor
Reichel, Jane
By organisation
Centre for Research Ethics and Bioethics
Medical Ethics

Search outside of DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 372 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf