uu.seUppsala University Publications
Change search
ReferencesLink to record
Permanent link

Direct link
HTSeq-Hadoop: Extending HTSeq for Massively Parallel Sequencing Data Analysis using Hadoop
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology.
Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences. Uppsala University, Science for Life Laboratory, SciLifeLab.ORCID iD: 0000-0002-8083-2864
2014 (English)In: Proc. 10th International Conference on e-Science, IEEE Computer Society, 2014, 317-323 p.Conference paper (Refereed)
Abstract [en]

Hadoop is a convenient framework in e-Science enabling scalable distributed data analysis. In molecular biology, next-generation sequencing produces vast amounts of data and requires flexible frameworks for constructing analysis pipelines. We extend the popular HTSeq package into the Hadoop realm by introducing massively parallel versions of short read quality assessment as well as functionality to count genes mapped by the short reads. We use the Hadoop-streaming library which allows the components to run in both Hadoop and regular Linux systems and evaluate their performance in two different execution environments: A single node on a computational cluster and a Hadoop cluster in a private cloud. We compare the implementations with Apache Pig showing improved runtime performance of our developed methods. We also inject the components in the graphical platform Cloudgene to simplify user interaction.

Place, publisher, year, edition, pages
IEEE Computer Society, 2014. 317-323 p.
National Category
Bioinformatics (Computational Biology)
URN: urn:nbn:se:uu:diva-242917DOI: 10.1109/eScience.2014.27ISBN: 978-1-4799-4288-6OAI: oai:DiVA.org:uu-242917DiVA: diva2:785398
e-Science 2014
Science for Life Laboratory - a national resource center for high-throughput molecular bioscienceSwedish National Infrastructure for Computing (SNIC), p2013023eSSENCE - An eScience CollaborationEU, European Research Council, BM1006
Available from: 2014-10-24 Created: 2015-02-02 Last updated: 2015-09-11Bibliographically approved

Open Access in DiVA

No full text

Other links

Publisher's full text

Search in DiVA

By author/editor
Siretskiy, AlexeySpjuth, Ola
By organisation
Department of Information TechnologyDepartment of Pharmaceutical BiosciencesScience for Life Laboratory, SciLifeLab
Bioinformatics (Computational Biology)

Search outside of DiVA

GoogleGoogle Scholar

Altmetric score

Total: 234 hits
ReferencesLink to record
Permanent link

Direct link