Logo: to the web site of Uppsala University

uu.sePublications from Uppsala University
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
BAMSI: a multi-cloud service for scalable distributed filtering of massive genome data
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Scientific Computing. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computational Science.ORCID iD: 0000-0002-6212-539X
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Scientific Computing. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computational Science.ORCID iD: 0000-0003-0302-6276
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Scientific Computing. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computational Science.ORCID iD: 0000-0001-7273-7923
Show others and affiliations
2018 (English)In: BMC Bioinformatics, E-ISSN 1471-2105, Vol. 19, p. 240:1-11, article id 240Article in journal (Refereed) Published
Place, publisher, year, edition, pages
2018. Vol. 19, p. 240:1-11, article id 240
National Category
Software Engineering Genetics
Identifiers
URN: urn:nbn:se:uu:diva-360033DOI: 10.1186/s12859-018-2241-zISI: 000436517200001PubMedID: 29940842OAI: oai:DiVA.org:uu-360033DiVA, id: diva2:1246661
Projects
eSSENCEAvailable from: 2018-06-26 Created: 2018-09-09 Last updated: 2024-01-17Bibliographically approved
In thesis
1. Efficient computational methods for applications in genomics
Open this publication in new window or tab >>Efficient computational methods for applications in genomics
2019 (English)Licentiate thesis, comprehensive summary (Other academic)
Abstract [en]

During the last two decades, advances in molecular technology have facilitated the sequencing and analysis of ancient DNA recovered from archaeological finds, contributing to novel insights into human evolutionary history. As more ancient genetic information has become available, the need for specialized methods of analysis has also increased. In this thesis, we investigate statistical and computational models for analysis of genetic data, with a particular focus on the context of ancient DNA.

The main focus is on imputation, or the inference of missing genotypes based on observed sequence data. We present results from a systematic evaluation of a common imputation pipeline on empirical ancient samples, and show that imputed data can constitute a realistic option for population-genetic analyses. We also discuss preliminary results from a simulation study comparing two methods of phasing and imputation, which suggest that the parametric Li and Stephens framework may be more robust to extremely low levels of sparsity than the parsimonious Browning and Browning model.

An evaluation of methods to handle missing data in the application of PCA for dimensionality reduction of genotype data is also presented. We illustrate that non-overlapping sequence data can lead to artifacts in projected scores, and evaluate different methods for handling unobserved genotypes.

In genomics, as in other fields of research, increasing sizes of data sets are placing larger demands on efficient data management and compute infrastructures. The last part of this thesis addresses the use of cloud resources for facilitating such analysis. We present two different cloud-based solutions, and exemplify them on applications from genomics.

Place, publisher, year, edition, pages
Uppsala University, 2019
Series
Information technology licentiate theses: Licentiate theses from the Department of Information Technology, ISSN 1404-5117 ; 2019-006
National Category
Computational Mathematics Genetics
Research subject
Scientific Computing
Identifiers
urn:nbn:se:uu:diva-396409 (URN)
Supervisors
Projects
eSSENCE
Available from: 2019-11-04 Created: 2019-11-04 Last updated: 2019-11-11Bibliographically approved
2. Methodology and Infrastructure for Statistical Computing in Genomics: Applications for Ancient DNA
Open this publication in new window or tab >>Methodology and Infrastructure for Statistical Computing in Genomics: Applications for Ancient DNA
2022 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

This thesis concerns the development and evaluation of computational methods for analysis of genetic data. A particular focus is on ancient DNA recovered from archaeological finds, the analysis of which has contributed to novel insights into human evolutionary and demographic history, while also introducing new challenges and the demand for specialized methods.

A main topic is that of imputation, or the inference of missing genotypes based on observed sequence data. We present results from a systematic evaluation of a common imputation pipeline on empirical ancient samples, and show that imputed data can constitute a realistic option for population-genetic analyses. We also develop a tool for genotype imputation that is based on the full probabilistic Li and Stephens model for haplotype frequencies and show that it can yield improved accuracy on particularly challenging data.  

Another central subject in genomics and population genetics is that of data characterization methods that allow for visualization and exploratory analysis of complex information. We discuss challenges associated with performing dimensionality reduction of genetic data, demonstrating how the use of principal component analysis is sensitive to incomplete information and performing an evaluation of methods to handle unobserved genotypes. We also discuss the use of deep learning models as an alternative to traditional methods of data characterization in genomics and propose a framework based on convolutional autoencoders that we exemplify on the applications of dimensionality reduction and genetic clustering.

In genomics, as in other fields of research, increasing sizes of data sets are placing larger demands on efficient data management and compute infrastructures. The final part of this thesis addresses the use of cloud resources for facilitating data analysis in scientific applications. We present two different cloud-based solutions, and exemplify them on applications from genomics.

Place, publisher, year, edition, pages
Uppsala: Acta Universitatis Upsaliensis, 2022. p. 53
Series
Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology, ISSN 1651-6214 ; 2129
Keywords
statistical computing, genotype imputation, ancient DNA, deep learning, dimensionality reduction, genetic clustering, distributed computing
National Category
Bioinformatics (Computational Biology) Computational Mathematics Genetics Software Engineering
Research subject
Scientific Computing
Identifiers
urn:nbn:se:uu:diva-470703 (URN)978-91-513-1457-0 (ISBN)
Public defence
2022-06-08, 101121, Lägerhyddsvägen 1, Uppsala, 10:15 (English)
Opponent
Supervisors
Projects
eSSENCE
Available from: 2022-05-17 Created: 2022-03-28 Last updated: 2022-06-14

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textPubMed

Authority records

Ausmees, KristiinaToor, Salman Z.Hellander, AndreasNettelblad, Carl

Search in DiVA

By author/editor
Ausmees, KristiinaToor, Salman Z.Hellander, AndreasNettelblad, Carl
By organisation
Division of Scientific ComputingComputational Science
In the same journal
BMC Bioinformatics
Software EngineeringGenetics

Search outside of DiVA

GoogleGoogle Scholar

doi
pubmed
urn-nbn

Altmetric score

doi
pubmed
urn-nbn
Total: 621 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf