uu.seUppsala University Publications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Tracking the NGS revolution: managing life science research on shared high-performance computing clusters.
Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences. (Spjuth)ORCID iD: 0000-0001-5447-9465
Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences. (Spjuth)
Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences. Uppsala University, Science for Life Laboratory, SciLifeLab. (Spjuth)ORCID iD: 0000-0002-8083-2864
2018 (English)In: GigaScience, ISSN 2047-217X, E-ISSN 2047-217X, Vol. 7, no 5, article id giy028Article in journal (Refereed) Published
Abstract [en]

Background: Next-Generation Sequencing (NGS) has transformed the life sciences and many research groups are newly dependent upon computer clusters to store and analyse large datasets. This creates challenges for e-infrastructures accustomed to hosting computationally mature research in other sciences. Using data gathered from our own clusters at UPPMAX computing centre at Uppsala University, Sweden, where core hours usage by ∼800 NGS and ∼200 non-NGS projects is now similar, we compare and contrast the growth, administrative burden and cluster usage of NGS projects with projects from other sciences.

Results: The number of NGS projects has grown rapidly since 2010, with growth driven by entry of new research groups. Storage used by NGS projects has grown more rapidly since 2013 and is now limited by disk capacity. NGS users submit nearly twice as many support tickets per user, and 11 more tools are installed each month for NGS than non-NGS projects. We develop usage and efficiency metrics and show that compute jobs in NGS projects use more RAM than in non-NGS projects, are more variable in core usage, and rarely span multiple nodes. NGS jobs use booked resources less efficiently for a variety of reasons. Active monitoring can improve this somewhat.

Conclusions: Hosting NGS projects imposes a large administrative burden at UPPMAX, due to large numbers of inexperienced users and diverse and rapidly evolving research areas. We give a set of recommendations for e-infrastructures hosting NGS research projects. We provide anonymised versions of our storage, job and efficiency databases.

Place, publisher, year, edition, pages
2018. Vol. 7, no 5, article id giy028
National Category
Bioinformatics and Systems Biology
Research subject
Bioinformatics
Identifiers
URN: urn:nbn:se:uu:diva-350009DOI: 10.1093/gigascience/giy028PubMedID: 29659792OAI: oai:DiVA.org:uu-350009DiVA, id: diva2:1203241
Funder
Science for Life Laboratory - a national resource center for high-throughput molecular bioscienceSwedish National Infrastructure for Computing (SNIC)
Available from: 2018-05-02 Created: 2018-05-02 Last updated: 2018-05-02

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textPubMed

Authority records BETA

Spjuth, Ola

Search in DiVA

By author/editor
Dahlö, MartinSchaal, WesleySpjuth, Ola
By organisation
Department of Pharmaceutical BiosciencesScience for Life Laboratory, SciLifeLab
In the same journal
GigaScience
Bioinformatics and Systems Biology

Search outside of DiVA

GoogleGoogle Scholar

doi
pubmed
urn-nbn

Altmetric score

doi
pubmed
urn-nbn
Total: 11 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf