Logo: to the web site of Uppsala University

uu.sePublications from Uppsala University
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Arteria: An automation system for a sequencing core facility
Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Sciences, Molecular Medicine. Uppsala University, Science for Life Laboratory, SciLifeLab.ORCID iD: 0000-0001-6962-1460
Uppsala University, Science for Life Laboratory, SciLifeLab. Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Sciences, Molecular Medicine.
Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Sciences, Molecular Medicine. Uppsala University, Science for Life Laboratory, SciLifeLab.
Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Sciences, Molecular Medicine. Uppsala University, Science for Life Laboratory, SciLifeLab.
Show others and affiliations
2019 (English)In: GigaScience, E-ISSN 2047-217X, Vol. 8, no 12, article id giz135Article in journal (Refereed) Published
Abstract [en]

Background: In recent years, nucleotide sequencing has become increasingly instrumental in both research and clinical settings. This has led to an explosive growth in sequencing data produced worldwide. As the amount of data increases, so does the need for automated solutions for data processing and analysis. The concept of workflows has gained favour in the bioinformatics community, but there is little in the scientific literature describing end-to-end automation systems. Arteria is an automation system that aims at providing a solution to the data-related operational challenges that face sequencing core facilities.

Findings: Arteria is built on existing open source technologies, with a modular design allowing for a community-driven effort to create plug-and-play micro-services. In this article we describe the system, elaborate on the underlying conceptual framework, and present an example implementation. Arteria can be reduced to 3 conceptual levels: orchestration (using an event-based model of automation), process (the steps involved in processing sequencing data, modelled as workflows), and execution (using a series of RESTful micro-services). This creates a system that is both flexible and scalable. Arteria-based systems have been successfully deployed at 3 sequencing core facilities. The Arteria Project code, written largely in Python, is available as open source software, and more information can be found at https://arteria-project.github.io/.

Conclusions: We describe the Arteria system and the underlying conceptual framework, demonstrating how this model can be used to automate data handling and analysis in the context of a sequencing core facility.

Place, publisher, year, edition, pages
2019. Vol. 8, no 12, article id giz135
National Category
Bioinformatics (Computational Biology) Computer Systems
Identifiers
URN: urn:nbn:se:uu:diva-357972DOI: 10.1093/gigascience/giz135ISI: 000506804600004PubMedID: 31825479OAI: oai:DiVA.org:uu-357972DiVA, id: diva2:1241281
Funder
Swedish Research CouncilKnut and Alice Wallenberg FoundationAvailable from: 2018-08-23 Created: 2018-08-23 Last updated: 2023-02-06Bibliographically approved
In thesis
1. Genetic Cartography at Massively Parallel Scale
Open this publication in new window or tab >>Genetic Cartography at Massively Parallel Scale
2018 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Massively parallel sequencing (MPS) is revolutionizing genomics. In this work we use, refine, and develop new tools for the discipline.

MPS has led to the discovery of multiple novel subtypes in Acute Lymphoblastic Leukemia (ALL). In Study I we screen for fusion genes in 134 pediatric ALL patients, including patients without an assigned subtype. In approximately 80% of these patients we detect novel or known fusion gene families, most of which display distinct methylation and expression patterns. This shows the potential for improvements in the clinical stratification of ALL. Large sample sizes are important to detect recurrent somatic variation. In Study II we investigate if a non-index overlapping pooling schema can be used to increase sample size and detect somatic variation. We designed a schema for 172 ALL samples and show that it is possible to use this method to call somatic variants.

Around the globe there are many ongoing and completed genome projects. In Study III we sequenced the genome of 1000 Swedes to create a reference data set for the Swedish population. We identified more than 10 million variants that were not present in publicly available databases, highlighting the need for population-specific resources. Data, and the tools developed during this study, have been made publicly available as a resource for genomics in Sweden and abroad.

The increased amount of sequencing data has created a greater need for automation. In Study IV we present Arteria, a computational automation system for sequencing core facilities. This system has been adopted by multiple facilities and has been used to analyze thousands of samples. In Study V we developed CheckQC, a program that provides automated quality control of Illumina sequencing runs. These tools make scaling up MPS less labour intensive, a key to unlocking the full future potential of genomics.

The tools, and data presented here are a valuable contribution to the scientific community. Collectively they showcase the power of MPS and genomics to bring about new knowledge of human health and disease.

Place, publisher, year, edition, pages
Uppsala: Acta Universitatis Upsaliensis, 2018. p. 68
Series
Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Medicine, ISSN 1651-6206 ; 1492
Keywords
Acute Lymphoblastic Leukemia (ALL), RNA-Sequencing, Bioinformatics, Pooling, Whole Genome Sequencing
National Category
Medical Genetics Cancer and Oncology Hematology Computer Systems Bioinformatics (Computational Biology)
Research subject
Medical Genetics; Bioinformatics
Identifiers
urn:nbn:se:uu:diva-358289 (URN)978-91-513-0428-1 (ISBN)
Public defence
2018-10-19, E10:1307-1309 (Trippelrummet), Navet, Biomedicinskt centrum, Husargatan 3, Uppsala, 09:00 (English)
Opponent
Supervisors
Available from: 2018-09-20 Created: 2018-08-27 Last updated: 2018-10-02

Open Access in DiVA

fulltext(933 kB)320 downloads
File information
File name FULLTEXT01.pdfFile size 933 kBChecksum SHA-512
b6c3d29494a5fc41690e6180fa4c16ce903b61507a4119e179778f86b443b4e9b753a8bce989f02ba880b6394cc58d3b5bfa43c363e21536559f3a52997e1189
Type fulltextMimetype application/pdf

Other links

Publisher's full textPubMed

Authority records

Dahlberg, JohanSturlaugsson, SteinarLysenkova, MariyaSmeds, PatrikLadenvall, Claes

Search in DiVA

By author/editor
Dahlberg, JohanSturlaugsson, SteinarLysenkova, MariyaSmeds, PatrikLadenvall, Claes
By organisation
Molecular MedicineScience for Life Laboratory, SciLifeLabDepartment of Immunology, Genetics and Pathology
In the same journal
GigaScience
Bioinformatics (Computational Biology)Computer Systems

Search outside of DiVA

GoogleGoogle Scholar
Total: 320 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
pubmed
urn-nbn

Altmetric score

doi
pubmed
urn-nbn
Total: 374 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf