Logotyp: till Uppsala universitets webbplats

uu.sePublikationer från Uppsala universitet
Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Evaluation of methods handling missing data in PCA on genotype data: Applications for ancient DNA
Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Matematisk-datavetenskapliga sektionen, Institutionen för informationsteknologi, Avdelningen för beräkningsvetenskap. Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Matematisk-datavetenskapliga sektionen, Institutionen för informationsteknologi, Tillämpad beräkningsvetenskap.ORCID-id: 0000-0002-6212-539x
2019 (Engelska)Rapport (Övrigt vetenskapligt)
Ort, förlag, år, upplaga, sidor
2019.
Serie
Technical report / Department of Information Technology, Uppsala University, ISSN 1404-3203 ; 2019-009
Nationell ämneskategori
Beräkningsmatematik Genetik och genomik
Identifikatorer
URN: urn:nbn:se:uu:diva-396346OAI: oai:DiVA.org:uu-396346DiVA, id: diva2:1367445
Projekt
eSSENCETillgänglig från: 2019-11-04 Skapad: 2019-11-04 Senast uppdaterad: 2025-02-01Bibliografiskt granskad
Ingår i avhandling
1. Efficient computational methods for applications in genomics
Öppna denna publikation i ny flik eller fönster >>Efficient computational methods for applications in genomics
2019 (Engelska)Licentiatavhandling, sammanläggning (Övrigt vetenskapligt)
Abstract [en]

During the last two decades, advances in molecular technology have facilitated the sequencing and analysis of ancient DNA recovered from archaeological finds, contributing to novel insights into human evolutionary history. As more ancient genetic information has become available, the need for specialized methods of analysis has also increased. In this thesis, we investigate statistical and computational models for analysis of genetic data, with a particular focus on the context of ancient DNA.

The main focus is on imputation, or the inference of missing genotypes based on observed sequence data. We present results from a systematic evaluation of a common imputation pipeline on empirical ancient samples, and show that imputed data can constitute a realistic option for population-genetic analyses. We also discuss preliminary results from a simulation study comparing two methods of phasing and imputation, which suggest that the parametric Li and Stephens framework may be more robust to extremely low levels of sparsity than the parsimonious Browning and Browning model.

An evaluation of methods to handle missing data in the application of PCA for dimensionality reduction of genotype data is also presented. We illustrate that non-overlapping sequence data can lead to artifacts in projected scores, and evaluate different methods for handling unobserved genotypes.

In genomics, as in other fields of research, increasing sizes of data sets are placing larger demands on efficient data management and compute infrastructures. The last part of this thesis addresses the use of cloud resources for facilitating such analysis. We present two different cloud-based solutions, and exemplify them on applications from genomics.

Ort, förlag, år, upplaga, sidor
Uppsala University, 2019
Serie
IT licentiate theses / Uppsala University, Department of Information Technology, ISSN 1404-5117 ; 2019-006
Nationell ämneskategori
Beräkningsmatematik Genetik och genomik
Forskningsämne
Beräkningsvetenskap
Identifikatorer
urn:nbn:se:uu:diva-396409 (URN)
Handledare
Projekt
eSSENCE
Tillgänglig från: 2019-11-04 Skapad: 2019-11-04 Senast uppdaterad: 2025-02-01Bibliografiskt granskad
2. Methodology and Infrastructure for Statistical Computing in Genomics: Applications for Ancient DNA
Öppna denna publikation i ny flik eller fönster >>Methodology and Infrastructure for Statistical Computing in Genomics: Applications for Ancient DNA
2022 (Engelska)Doktorsavhandling, sammanläggning (Övrigt vetenskapligt)
Abstract [en]

This thesis concerns the development and evaluation of computational methods for analysis of genetic data. A particular focus is on ancient DNA recovered from archaeological finds, the analysis of which has contributed to novel insights into human evolutionary and demographic history, while also introducing new challenges and the demand for specialized methods.

A main topic is that of imputation, or the inference of missing genotypes based on observed sequence data. We present results from a systematic evaluation of a common imputation pipeline on empirical ancient samples, and show that imputed data can constitute a realistic option for population-genetic analyses. We also develop a tool for genotype imputation that is based on the full probabilistic Li and Stephens model for haplotype frequencies and show that it can yield improved accuracy on particularly challenging data.  

Another central subject in genomics and population genetics is that of data characterization methods that allow for visualization and exploratory analysis of complex information. We discuss challenges associated with performing dimensionality reduction of genetic data, demonstrating how the use of principal component analysis is sensitive to incomplete information and performing an evaluation of methods to handle unobserved genotypes. We also discuss the use of deep learning models as an alternative to traditional methods of data characterization in genomics and propose a framework based on convolutional autoencoders that we exemplify on the applications of dimensionality reduction and genetic clustering.

In genomics, as in other fields of research, increasing sizes of data sets are placing larger demands on efficient data management and compute infrastructures. The final part of this thesis addresses the use of cloud resources for facilitating data analysis in scientific applications. We present two different cloud-based solutions, and exemplify them on applications from genomics.

Ort, förlag, år, upplaga, sidor
Uppsala: Acta Universitatis Upsaliensis, 2022. s. 53
Serie
Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology, ISSN 1651-6214 ; 2129
Nyckelord
statistical computing, genotype imputation, ancient DNA, deep learning, dimensionality reduction, genetic clustering, distributed computing
Nationell ämneskategori
Bioinformatik (beräkningsbiologi) Beräkningsmatematik Genetik och genomik Programvaruteknik
Forskningsämne
Beräkningsvetenskap
Identifikatorer
urn:nbn:se:uu:diva-470703 (URN)978-91-513-1457-0 (ISBN)
Disputation
2022-06-08, 101121, Lägerhyddsvägen 1, Uppsala, 10:15 (Engelska)
Opponent
Handledare
Projekt
eSSENCE
Tillgänglig från: 2022-05-17 Skapad: 2022-03-28 Senast uppdaterad: 2025-02-01

Open Access i DiVA

fulltext(339 kB)125 nedladdningar
Filinformation
Filnamn FULLTEXT02.pdfFilstorlek 339 kBChecksumma SHA-512
d5954f4d7ad3bab4afc2fbec7d21c54f64ca072f9b613452d79184c666cd7baca0329ee44d1754dd14f5d0d052bc03bab5213920f1e09fb0250091d9ed55b111
Typ fulltextMimetyp application/pdf

Person

Ausmees, Kristiina

Sök vidare i DiVA

Av författaren/redaktören
Ausmees, Kristiina
Av organisationen
Avdelningen för beräkningsvetenskapTillämpad beräkningsvetenskap
BeräkningsmatematikGenetik och genomik

Sök vidare utanför DiVA

GoogleGoogle Scholar
Totalt: 166 nedladdningar
Antalet nedladdningar är summan av nedladdningar för alla fulltexter. Det kan inkludera t.ex tidigare versioner som nu inte längre är tillgängliga.

urn-nbn

Altmetricpoäng

urn-nbn
Totalt: 1142 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf