uu.seUppsala University Publications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
HTSeq-Hadoop: Extending HTSeq for Massively Parallel Sequencing Data Analysis using Hadoop
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology.
Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences. Uppsala University, Science for Life Laboratory, SciLifeLab.ORCID iD: 0000-0002-8083-2864
2014 (English)In: Proc. 10th International Conference on e-Science, IEEE Computer Society, 2014, 317-323 p.Conference paper, Published paper (Refereed)
Abstract [en]

Hadoop is a convenient framework in e-Science enabling scalable distributed data analysis. In molecular biology, next-generation sequencing produces vast amounts of data and requires flexible frameworks for constructing analysis pipelines. We extend the popular HTSeq package into the Hadoop realm by introducing massively parallel versions of short read quality assessment as well as functionality to count genes mapped by the short reads. We use the Hadoop-streaming library which allows the components to run in both Hadoop and regular Linux systems and evaluate their performance in two different execution environments: A single node on a computational cluster and a Hadoop cluster in a private cloud. We compare the implementations with Apache Pig showing improved runtime performance of our developed methods. We also inject the components in the graphical platform Cloudgene to simplify user interaction.

Place, publisher, year, edition, pages
IEEE Computer Society, 2014. 317-323 p.
National Category
Bioinformatics (Computational Biology)
Identifiers
URN: urn:nbn:se:uu:diva-242917DOI: 10.1109/eScience.2014.27ISBN: 978-1-4799-4288-6 (print)OAI: oai:DiVA.org:uu-242917DiVA: diva2:785398
Conference
e-Science 2014
Funder
Science for Life Laboratory - a national resource center for high-throughput molecular bioscienceSwedish National Infrastructure for Computing (SNIC), p2013023eSSENCE - An eScience CollaborationEU, European Research Council, BM1006
Available from: 2014-10-24 Created: 2015-02-02 Last updated: 2015-09-11Bibliographically approved

Open Access in DiVA

No full text

Other links

Publisher's full text

Authority records BETA

Siretskiy, AlexeySpjuth, Ola

Search in DiVA

By author/editor
Siretskiy, AlexeySpjuth, Ola
By organisation
Department of Information TechnologyDepartment of Pharmaceutical BiosciencesScience for Life Laboratory, SciLifeLab
Bioinformatics (Computational Biology)

Search outside of DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 413 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf