Two Optimization Problems in Genetics: Multi-dimensional QTL Analysis and Haplotype Inference
2012 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]
The existence of new technologies, implemented in efficient platforms and workflows has made massive genotyping available to all fields of biology and medicine. Genetic analyses are no longer dominated by experimental work in laboratories, but rather the interpretation of the resulting data. When billions of data points representing thousands of individuals are available, efficient computational tools are required. The focus of this thesis is on developing models, methods and implementations for such tools.
The first theme of the thesis is multi-dimensional scans for quantitative trait loci (QTL) in experimental crosses. By mating individuals from different lines, it is possible to gather data that can be used to pinpoint the genetic variation that influences specific traits to specific genome loci. However, it is natural to expect multiple genes influencing a single trait to interact. The thesis discusses model structure and model selection, giving new insight regarding under what conditions orthogonal models can be devised. The thesis also presents a new optimization method for efficiently and accurately locating QTL, and performing the permuted data searches needed for significance testing. This method has been implemented in a software package that can seamlessly perform the searches on grid computing infrastructures.
The other theme in the thesis is the development of adapted optimization schemes for using hidden Markov models in tracing allele inheritance pathways, and specifically inferring haplotypes. The advances presented form the basis for more accurate and non-biased line origin probabilities in experimental crosses, especially multi-generational ones. We show that the new tools are able to reconstruct haplotypes and even genotypes in founder individuals and offspring alike, based on only unordered offspring genotypes. The tools can also handle larger populations than competing methods, resolving inheritance pathways and phase in much larger and more complex populations. Finally, the methods presented are also applicable to datasets where individual relationships are not known, which is frequently the case in human genetics studies. One immediate application for this would be improved accuracy for imputation of SNP markers within genome-wide association studies (GWAS).
Place, publisher, year, edition, pages
Uppsala: Acta Universitatis Upsaliensis, 2012. , p. 57
Series
Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology, ISSN 1651-6214 ; 973
Keywords [en]
quantitative trait loci, genome-wide association studies, hidden Markov models, numerical optimization, linkage analysis, haplotype inference, genotype imputation, high performance computing
National Category
Computational Mathematics Probability Theory and Statistics Bioinformatics and Systems Biology Genetics Bioinformatics (Computational Biology) Software Engineering
Identifiers
URN: urn:nbn:se:uu:diva-180920ISBN: 978-91-554-8473-6 (print)OAI: oai:DiVA.org:uu-180920DiVA, id: diva2:552121
Public defence
2012-10-26, Room 2446, Polacksbacken, Lägerhyddsvägen 2D, Uppsala, 13:15 (English)
Opponent
Supervisors
Projects
eSSENCE2012-10-042012-09-132018-01-12Bibliographically approved
List of papers