uu.seUppsala University Publications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Protecting Privacy: Automatic Compression and Encryption of Next-Generation Sequencing Alignment Data
Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Immunology, Genetics and Pathology, Experimental and Clinical Oncology. (Tobias Sjöblom)
2019 (English)Independent thesis Advanced level (professional degree), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

As the field of next-generation sequencing (NGS) matures and the technology grows more advanced, it is becoming an increasingly strong tool for solving various biological problems. Harvesting and analysing the full genomic sequence of an individual and comparing it to a reference genome can unravel information about detrimental mutations, in particular ones that give rise to diseases such as cancer.

At the Rudbeck Laboratory, Uppsala University, a fully automatic software pipeline for somatic mutational analysis of cancer patient sequence data is in development. This will increase the efficiency and accuracy of a process which today consists of several discrete computation steps. In turn, this will reduce the time to result and facilitate the process of making a diagnosis and delegate the optimal treatment for the patient. However, the genomic data of an individual is very sensitive and private, which demands that great security precautions are taken. Moreover, as more and more data are produced storage space is becoming increasingly valuable, which requires that data are handled and stored as efficiently as possible.

In this project, I developed a Python pipeline for automatic compression and encryption of NGS alignment data, which aims to ensure full privacy protection of patient data while maintaining high computational and storage efficiency. The pipeline uses a state-of-the-art real-time compression algorithm combined with an Advanced Encryption Standard cipher. It offers security that meets rigorous modern standards, and performance which at least matches that of existing solutions. The system is made to be easily integrated in the somatic mutation analysis pipeline. This way, the data generated during the analysis, which are too large to be kept in operational memory, can safely be stored to disk.

Place, publisher, year, edition, pages
2019. , p. 44
Series
UPTEC X ; 19010
Keywords [en]
data protection, compression, encryption, genomic data, privacy, cancer, ngs, next-generation sequencing, AES, Zstandard, sequencing
National Category
Bioinformatics and Systems Biology
Identifiers
URN: urn:nbn:se:uu:diva-386413OAI: oai:DiVA.org:uu-386413DiVA, id: diva2:1327535
Educational program
Molecular Biotechnology Engineering Programme
Supervisors
Examiners
Available from: 2019-06-20 Created: 2019-06-19 Last updated: 2019-06-20Bibliographically approved

Open Access in DiVA

The full text will be freely available from 2022-06-19 16:09
Available from 2022-06-19 16:09

Search in DiVA

By author/editor
Gustafsson, Wiktor
By organisation
Experimental and Clinical Oncology
Bioinformatics and Systems Biology

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 101 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf