Logo: to the web site of Uppsala University

uu.sePublications from Uppsala University
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Two Optimization Problems in Genetics: Multi-dimensional QTL Analysis and Haplotype Inference
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Scientific Computing. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computational Science.
2012 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

The existence of new technologies, implemented in efficient platforms and workflows has made massive genotyping available to all fields of biology and medicine. Genetic analyses are no longer dominated by experimental work in laboratories, but rather the interpretation of the resulting data. When billions of data points representing thousands of individuals are available, efficient computational tools are required. The focus of this thesis is on developing models, methods and implementations for such tools.

The first theme of the thesis is multi-dimensional scans for quantitative trait loci (QTL) in experimental crosses. By mating individuals from different lines, it is possible to gather data that can be used to pinpoint the genetic variation that influences specific traits to specific genome loci. However, it is natural to expect multiple genes influencing a single trait to interact. The thesis discusses model structure and model selection, giving new insight regarding under what conditions orthogonal models can be devised. The thesis also presents a new optimization method for efficiently and accurately locating QTL, and performing the permuted data searches needed for significance testing. This method has been implemented in a software package that can seamlessly perform the searches on grid computing infrastructures.

The other theme in the thesis is the development of adapted optimization schemes for using hidden Markov models in tracing allele inheritance pathways, and specifically inferring haplotypes. The advances presented form the basis for more accurate and non-biased line origin probabilities in experimental crosses, especially multi-generational ones. We show that the new tools are able to reconstruct haplotypes and even genotypes in founder individuals and offspring alike, based on only unordered offspring genotypes. The tools can also handle larger populations than competing methods, resolving inheritance pathways and phase in much larger and more complex populations. Finally, the methods presented are also applicable to datasets where individual relationships are not known, which is frequently the case in human genetics studies. One immediate application for this would be improved accuracy for imputation of SNP markers within genome-wide association studies (GWAS).

Place, publisher, year, edition, pages
Uppsala: Acta Universitatis Upsaliensis, 2012. , p. 57
Series
Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology, ISSN 1651-6214 ; 973
Keywords [en]
quantitative trait loci, genome-wide association studies, hidden Markov models, numerical optimization, linkage analysis, haplotype inference, genotype imputation, high performance computing
National Category
Computational Mathematics Probability Theory and Statistics Bioinformatics and Systems Biology Genetics Bioinformatics (Computational Biology) Software Engineering
Identifiers
URN: urn:nbn:se:uu:diva-180920ISBN: 978-91-554-8473-6 (print)OAI: oai:DiVA.org:uu-180920DiVA, id: diva2:552121
Public defence
2012-10-26, Room 2446, Polacksbacken, Lägerhyddsvägen 2D, Uppsala, 13:15 (English)
Opponent
Supervisors
Projects
eSSENCEAvailable from: 2012-10-04 Created: 2012-09-13 Last updated: 2018-01-12Bibliographically approved
List of papers
1. Coherent estimates of genetic effects with missing information
Open this publication in new window or tab >>Coherent estimates of genetic effects with missing information
2012 (English)In: Open Journal of Genetics, ISSN 2162-4453, E-ISSN 2162-4461, Vol. 2, p. 31-38Article in journal (Refereed) Published
Keywords
genetic effects, missing genotypes, orthogonal estimation, QTL analysis
National Category
Bioinformatics and Systems Biology Genetics Probability Theory and Statistics Computational Mathematics
Identifiers
urn:nbn:se:uu:diva-180915 (URN)10.4236/ojgen.2012.21003 (DOI)
Projects
eSSENCE
Available from: 2012-03-02 Created: 2012-09-12 Last updated: 2017-12-07Bibliographically approved
2. Fast and accurate detection of multiple quantitative trait loci
Open this publication in new window or tab >>Fast and accurate detection of multiple quantitative trait loci
2013 (English)In: Journal of Computational Biology, ISSN 1066-5277, E-ISSN 1557-8666, Vol. 20, p. 687-702Article in journal (Refereed) Published
National Category
Bioinformatics and Systems Biology Genetics Computational Mathematics Probability Theory and Statistics
Identifiers
urn:nbn:se:uu:diva-180916 (URN)10.1089/cmb.2012.0242 (DOI)000323822000006 ()
Projects
eSSENCE
Available from: 2013-08-06 Created: 2012-09-13 Last updated: 2017-12-07Bibliographically approved
3. A Grid-Enabled Problem Solving Environment for QTL Analysis in R
Open this publication in new window or tab >>A Grid-Enabled Problem Solving Environment for QTL Analysis in R
Show others...
2010 (English)In: Proc. 2nd International Conference on Bioinformatics and Computational Biology, Cary, NC: ISCA , 2010, p. 202-209Conference paper, Published paper (Refereed)
Place, publisher, year, edition, pages
Cary, NC: ISCA, 2010
National Category
Software Engineering Genetics
Identifiers
urn:nbn:se:uu:diva-111594 (URN)978-1-880843-76-5 (ISBN)
Projects
eSSENCE
Available from: 2010-01-12 Created: 2009-12-17 Last updated: 2018-01-12Bibliographically approved
4. cnF2freq: Efficient determination of genotype and haplotype probabilities in outbred populations using Markov models
Open this publication in new window or tab >>cnF2freq: Efficient determination of genotype and haplotype probabilities in outbred populations using Markov models
2009 (English)In: Bioinformatics and Computational Biology, Berlin: Springer-Verlag , 2009, p. 307-319Conference paper, Published paper (Refereed)
Place, publisher, year, edition, pages
Berlin: Springer-Verlag, 2009
Series
Lecture Notes in Computer Science ; 5462
National Category
Computational Mathematics Genetics
Identifiers
urn:nbn:se:uu:diva-103916 (URN)10.1007/978-3-642-00727-9_29 (DOI)000265785800029 ()978-3-642-00726-2 (ISBN)
Available from: 2009-05-25 Created: 2009-05-25 Last updated: 2017-01-25Bibliographically approved
5. An improved method for estimating chromosomal line origin in QTL analysis of crosses between outbred lines
Open this publication in new window or tab >>An improved method for estimating chromosomal line origin in QTL analysis of crosses between outbred lines
2011 (English)In: G3: Genes, Genomes, Genetics, E-ISSN 2160-1836, Vol. 1, p. 57-64Article in journal (Refereed) Published
National Category
Computational Mathematics Genetics
Identifiers
urn:nbn:se:uu:diva-156197 (URN)10.1534/g3.111.000109 (DOI)000312405400007 ()
Projects
eSSENCE
Available from: 2011-06-01 Created: 2011-07-15 Last updated: 2024-01-17Bibliographically approved
6. MAPfastR: Quantitative trait loci mapping in outbred line crosses
Open this publication in new window or tab >>MAPfastR: Quantitative trait loci mapping in outbred line crosses
Show others...
2013 (English)In: G3: Genes, Genomes, Genetics, E-ISSN 2160-1836, Vol. 3, p. 2147-2149Article in journal (Refereed) Published
National Category
Computational Mathematics Genetics Bioinformatics and Systems Biology
Identifiers
urn:nbn:se:uu:diva-180917 (URN)10.1534/g3.113.008623 (DOI)000328334500005 ()
Projects
eSSENCE
Available from: 2013-10-11 Created: 2012-09-13 Last updated: 2024-01-17Bibliographically approved
7. Haplotype inference based on hidden Markov models in the QTL–MAS 2010 multigenerational dataset
Open this publication in new window or tab >>Haplotype inference based on hidden Markov models in the QTL–MAS 2010 multigenerational dataset
2011 (English)In: Proc. 14th European Workshop on QTL Mapping and Marker Assisted Selection, London: BioMed Central , 2011, p. S10:1-7Conference paper, Published paper (Refereed)
Place, publisher, year, edition, pages
London: BioMed Central, 2011
Series
BMC Proceedings, ISSN 1753-6561 ; 5:3
National Category
Computational Mathematics Genetics
Identifiers
urn:nbn:se:uu:diva-153449 (URN)10.1186/1753-6561-5-S3-S10 (DOI)
Projects
eSSENCE
Available from: 2010-05-17 Created: 2011-05-12 Last updated: 2017-01-25Bibliographically approved
8. Inferring haplotypes and parental genotypes in larger full sib-ships and other pedigrees with missing or erroneous genotype data
Open this publication in new window or tab >>Inferring haplotypes and parental genotypes in larger full sib-ships and other pedigrees with missing or erroneous genotype data
2012 (English)In: BMC Genetics, E-ISSN 1471-2156, Vol. 13, p. 85:1-13Article in journal (Refereed) Published
Keywords
haplotyping, phasing, genotype inference, nuclear family data, hidden Markov models
National Category
Probability Theory and Statistics Computational Mathematics Genetics Bioinformatics and Systems Biology
Identifiers
urn:nbn:se:uu:diva-182488 (URN)10.1186/1471-2156-13-85 (DOI)000314354600001 ()
Projects
eSSENCE
Available from: 2012-10-10 Created: 2012-10-10 Last updated: 2024-01-17Bibliographically approved
9. Breakdown of methods for phasing and imputation in the presence of double genotype sharing
Open this publication in new window or tab >>Breakdown of methods for phasing and imputation in the presence of double genotype sharing
2012 (English)Report (Other academic)
Abstract [en]

In genome-wide association studies, results have been improved through imputation of a denser marker set based on reference haplotypes and phasing of the genotype data. To better handle very large sets of reference haplotypes, pre-phasing with only study individuals has been suggested. We present a possible problem which is aggravated when pre-phasing strategies are used, and suggest a modification avoiding these issues with application to the MaCH tool.

We evaluate the effectiveness of our remedy to a subset of Hapmap data, comparing the original version of MaCH and our modified approach. Improvements are demonstrated on the original data (phase switch error rate decresasing by 10%), but the differences are more pronounced in cases where the data is augmented to represent the presence of closely related individuals, especially when siblings are present (30% reduction in switch error rate in the presence of children, 47% reduction in the presence of siblings). When introducing siblings, the switch error rate in results from the unmodified version of MaCH increases significantly compared to the original data.

The main conclusions of this investigation is that existing statistical methods for phasing and imputation of unrelated individuals might give subpar quality results if a subset of study individuals nonetheless are related. As the populations collected for general genome-wide association studies grow in size, including relatives might become more common. If a general GWAS framework for unrelated individuals would be employed on datasets where sub-populations originally collected as familial case-control sets are included, caution should also be taken regarding the quality of haplotypes.

Our modification to MaCH is available on request and straightforward to implement. We hope that this mode, if found to be of use, could be integrated as an option in future standard distributions of MaCH.

Series
Technical report / Department of Information Technology, Uppsala University, ISSN 1404-3203 ; 2012-027
National Category
Probability Theory and Statistics Computational Mathematics Genetics Bioinformatics and Systems Biology
Identifiers
urn:nbn:se:uu:diva-181598 (URN)
Projects
eSSENCE
Available from: 2012-09-25 Created: 2012-09-26 Last updated: 2024-05-30Bibliographically approved

Open Access in DiVA

fulltext(543 kB)3437 downloads
File information
File name FULLTEXT01.pdfFile size 543 kBChecksum SHA-512
73fe6b655910eb28f1ea5073c001a0d4db5459d856df6190f5226a82ce123dfc13c68d44681eb2248d7842bb5ba1f6ffba423b9597bdd4708228b74123174fb0
Type fulltextMimetype application/pdf

Authority records

Nettelblad, Carl

Search in DiVA

By author/editor
Nettelblad, Carl
By organisation
Division of Scientific ComputingComputational Science
Computational MathematicsProbability Theory and StatisticsBioinformatics and Systems BiologyGeneticsBioinformatics (Computational Biology)Software Engineering

Search outside of DiVA

GoogleGoogle Scholar
Total: 3437 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 2405 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf