uu.seUppsala University Publications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Interoperable and scalable data analysis with microservices: Applications in metabolomics
Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Sciences, Clinical Chemistry.ORCID iD: 0000-0002-4137-5517
Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Neuroscience, Landtblom: Neurology.ORCID iD: 0000-0002-7045-1806
Show others and affiliations
2019 (English)In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 35, no 19, p. 3752-3760Article in journal (Refereed) Published
Abstract [en]

Motivation

Developing a robust and performant data analysis workflow that integrates all necessary components whilst still being able to scale over multiple compute nodes is a challenging task. We introduce a generic method based on the microservice architecture, where software tools are encapsulated as Docker containers that can be connected into scientific workflows and executed using the Kubernetes container orchestrator.

Results

We developed a Virtual Research Environment (VRE) which facilitates rapid integration of new tools and developing scalable and interoperable workflows for performing metabolomics data analysis. The environment can be launched on-demand on cloud resources and desktop computers. IT-expertise requirements on the user side are kept to a minimum, and workflows can be re-used effortlessly by any novice user. We validate our method in the field of metabolomics on two mass spectrometry, one nuclear magnetic resonance spectroscopy and one fluxomics study. We showed that the method scales dynamically with increasing availability of computational resources. We demonstrated that the method facilitates interoperability using integration of the major software suites resulting in a turn-key workflow encompassing all steps for mass-spectrometry-based metabolomics including preprocessing, statistics and identification. Microservices is a generic methodology that can serve any scientific discipline and opens up for new types of large-scale integrative science.

Place, publisher, year, edition, pages
2019. Vol. 35, no 19, p. 3752-3760
Keywords [en]
Bioinformatics, e-infrastructure, microservices, metabolomics, kubernetes, Docker, container
National Category
Bioinformatics (Computational Biology)
Identifiers
URN: urn:nbn:se:uu:diva-390670DOI: 10.1093/bioinformatics/btz160ISI: 000499322300026PubMedID: 30851093OAI: oai:DiVA.org:uu-390670DiVA, id: diva2:1342450
Funder
EU, Horizon 2020, 654241Swedish Research Council FormasÅke Wiberg FoundationSwedish National Infrastructure for Computing (SNIC)
Note

Title in thesis list of papers: Interoperable and scalable metabolomics data analysis with microservices

Available from: 2019-03-09 Created: 2019-08-13 Last updated: 2020-01-07Bibliographically approved
In thesis
1. Enabling Scalable Data Analysis on Cloud Resources with Applications in Life Science
Open this publication in new window or tab >>Enabling Scalable Data Analysis on Cloud Resources with Applications in Life Science
2019 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Over the past 20 years, the rise of high-throughput methods in life science has enabled research laboratories to produce massive datasets of biological interest. When dealing with this "data deluge" of modern biology researchers encounter two major challenges: first, there is a need for substantial technical skills for dealing with Big Data and; second, infrastructure procurement becomes difficult. In connection to this second challenge, the computing model and business trend that was originally popularized by Amazon under the name of cloud computing represents an interesting opportunity. Instead of buying computing infrastructure upfront, cloud providers enable the allocation and release of virtual resources on-demand. These resources are then billed with a pay-per-use pricing model and physical infrastructure management is delegated to the provider. In this thesis, we introduce a number of methods for running Big Data analyses of biological interest using cloud computing. Considerable efforts were made in enabling the application of trusted, bioinformatics software to Big Data scenarios as opposed to reimplementing the existing codebase. Further, we improve the accessibility of the technology with the aim of reducing the entry barrier for biologists. The thesis includes 5 papers. In Papers I and II, we explore the applicability of Apache Spark, one of the leading Big Data analytics platforms in cloud environments, to two drug-discovery use cases. In Paper III, we present a general method for running bioinformatics analyses on the cloud using the microservices-oriented architecture. In Paper IV, we introduce a method that combines microservices and Apache Spark with the aim of providing the best of both technologies. In Paper V, we discuss how to reduce the entry barrier for the allocation of cloud research environments. We show that all of the developed methods scale well and we provide high-level programming interfaces for improving accessibility. We have also made the developed software publicly available.

Place, publisher, year, edition, pages
Uppsala: Acta Universitatis Upsaliensis, 2019. p. 71
Series
Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology, ISSN 1651-6214 ; 1846
Keywords
cloud computing, bioinformatics, Big Data, microservices, containers, MapReduce
National Category
Computational Mathematics
Research subject
Scientific Computing
Identifiers
urn:nbn:se:uu:diva-390666 (URN)978-91-513-0730-5 (ISBN)
Public defence
2019-10-10, B42, Uppsala Biomedicinska Centrum, Husargatan 3, Uppsala, 13:15 (English)
Opponent
Supervisors
Available from: 2019-09-17 Created: 2019-08-22 Last updated: 2019-10-15
2. Proteomics Studies of Subjects with Alzheimer’s Disease and Chronic Pain
Open this publication in new window or tab >>Proteomics Studies of Subjects with Alzheimer’s Disease and Chronic Pain
2017 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Alzheimer’s disease (AD) is a neurodegenerative disease and the major cause of dementia, affecting more than 50 million people worldwide. Chronic pain is long-lasting, persistent pain that affects more than 1.5 billion of the world population. Overlapping and heterogenous symptoms of AD and chronic pain conditions complicate their diagnosis, emphasizing the need for more specific biomarkers to improve the diagnosis and understand the disease mechanisms.

To characterize disease pathology of AD, we measured the protein changes in the temporal neocortex region of the brain of AD subjects using mass spectrometry (MS). We found proteins involved in exo-endocytic and extracellular vesicle functions displaying altered levels in the AD brain, potentially resulting in neuronal dysfunction and cell death in AD.

To detect novel biomarkers for AD, we used MS to analyze cerebrospinal fluid (CSF) of AD patients and found decreased levels of eight proteins compared to controls, potentially indicating abnormal activity of complement system in AD.

By integrating new proteomics markers with absolute levels of Aβ42, total tau (t-tau) and p-tau in CSF, we improved the prediction accuracy from 83% to 92% of early diagnosis of AD. We found increased levels of chitinase-3-like protein 1 (CH3L1) and decreased levels of neurosecretory protein VGF (VGF) in AD compared to controls.

By exploring the CSF proteome of neuropathic pain patients before and after successful spinal cord stimulation (SCS) treatment, we found altered levels of twelve proteins, involved in neuroprotection, synaptic plasticity, nociceptive signaling and immune regulation.

To detect biomarkers for diagnosing a chronic pain state known as fibromyalgia (FM), we analyzed the CSF of FM patients using MS. We found altered levels of four proteins, representing novel biomarkers for diagnosing FM. These proteins are involved in inflammatory mechanisms, energy metabolism and neuropeptide signaling.

Finally, to facilitate fast and robust large-scale omics data handling, we developed an e-infrastructure. We demonstrated that the e-infrastructure provides high scalability, flexibility and it can be applied in virtually any fields including proteomics. This thesis demonstrates that proteomics is a promising approach for gaining deeper insight into mechanisms of nervous system disorders and find biomarkers for diagnosis of such diseases.

Place, publisher, year, edition, pages
Uppsala: Acta Universitatis Upsaliensis, 2017. p. 82
Series
Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Medicine, ISSN 1651-6206 ; 1385
Keywords
Bioinformatics, microservices, biomarkers, Alzheimer's disease, chronic pain, fibromyalgia, neuropathic pain, spinal cord stimulation, cloud computing, proteomics, metabolomics, software, workflows, data analysis, mass spectrometry
National Category
Geriatrics Neurology Neurosciences
Research subject
Bioinformatics; Neurology; Geriatrics
Identifiers
urn:nbn:se:uu:diva-331748 (URN)978-91-513-0111-2 (ISBN)
Public defence
2017-12-05, Rosénsalen, Akademiska sjukhuset, Ing 95/96, nbv, Uppsala, 09:00 (English)
Opponent
Supervisors
Available from: 2017-11-14 Created: 2017-10-17 Last updated: 2020-01-07

Open Access in DiVA

fulltext(4324 kB)31 downloads
File information
File name FULLTEXT01.pdfFile size 4324 kBChecksum SHA-512
e97458d11297f07b9de22f888c6d3d596d2de96780ab6547cb5c796822dd46a217cd447af52d35ff409211b2068a70d65c2e0aafe5b6fcda360fa2bac3a57d1c
Type fulltextMimetype application/pdf

Other links

Publisher's full textPubMed

Authority records BETA

Emami Khoonsari, PayamBurman, JoachimCapuccini, MarcoHerman, StephanieLarsson, AndersKultima, KimSpjuth, Ola

Search in DiVA

By author/editor
Emami Khoonsari, PayamBurman, JoachimCapuccini, MarcoHerman, StephanieLarsson, AndersKultima, KimSpjuth, Ola
By organisation
Clinical ChemistryLandtblom: NeurologyDepartment of Pharmaceutical BiosciencesDepartment of Cell and Molecular Biology
In the same journal
Bioinformatics
Bioinformatics (Computational Biology)

Search outside of DiVA

GoogleGoogle Scholar
Total: 31 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
pubmed
urn-nbn

Altmetric score

doi
pubmed
urn-nbn
Total: 184 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf