Logotyp: till Uppsala universitets webbplats

uu.sePublikationer från Uppsala universitet
Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
A Grid-Enabled Problem Solving Environment for QTL Analysis in R
Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Matematisk-datavetenskapliga sektionen, Institutionen för informationsteknologi, Avdelningen för teknisk databehandling. Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Matematisk-datavetenskapliga sektionen, Institutionen för informationsteknologi, Tillämpad beräkningsvetenskap.
Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Matematisk-datavetenskapliga sektionen, Institutionen för informationsteknologi, Avdelningen för teknisk databehandling. Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Matematisk-datavetenskapliga sektionen, Institutionen för informationsteknologi, Tillämpad beräkningsvetenskap.
Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Matematisk-datavetenskapliga sektionen, Institutionen för informationsteknologi, Avdelningen för teknisk databehandling. Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Matematisk-datavetenskapliga sektionen, Institutionen för informationsteknologi, Tillämpad beräkningsvetenskap.
Visa övriga samt affilieringar
2010 (Engelska)Ingår i: Proc. 2nd International Conference on Bioinformatics and Computational Biology, Cary, NC: ISCA , 2010, s. 202-209Konferensbidrag, Publicerat paper (Refereegranskat)
Ort, förlag, år, upplaga, sidor
Cary, NC: ISCA , 2010. s. 202-209
Nationell ämneskategori
Programvaruteknik Genetik
Identifikatorer
URN: urn:nbn:se:uu:diva-111594ISBN: 978-1-880843-76-5 (tryckt)OAI: oai:DiVA.org:uu-111594DiVA, id: diva2:285725
Projekt
eSSENCETillgänglig från: 2010-01-12 Skapad: 2009-12-17 Senast uppdaterad: 2018-01-12Bibliografiskt granskad
Ingår i avhandling
1. An e-Science Approach to Genetic Analysis of Quantitative Traits
Öppna denna publikation i ny flik eller fönster >>An e-Science Approach to Genetic Analysis of Quantitative Traits
2010 (Engelska)Doktorsavhandling, sammanläggning (Övrigt vetenskapligt)
Abstract [en]

Many important traits in plants, animals and humans are quantitative, and most such traits are generally believed to be affected by multiple genetic loci. Standard computational tools for mapping of quantitative traits (i.e. for finding Quantitative Trait Loci, QTL, in the genome) use linear regression models for relating the observed phenotypes to the genetic composition of individuals in an experimental population. Using these tools to simultaneously search for multiple QTL is computationally demanding. The main reason for this is the complex optimization landscape for the multidimensional global optimization problems that must be solved. This thesis describes parallel algorithms, implementations and tools for simultaneous mapping of several QTL. These new computational tools enable genetic analysis exploiting new classes of multidimensional statistical models, potentially resulting in interesting results in genetics.

We first describe how the standard, brute-force algorithm for global optimization in QTL analysis is parallelized and implemented on a grid system. Then, we also present a parallelized version of the more elaborate global optimization algorithm DIRECT and show how this can be efficiently deployed and used on grid systems and other loosely-coupled architectures. The parallel DIRECT scheme is further developed to exploit both coarse-grained parallelism in grid systems or clusters as well as fine-grained, tightly-coupled parallelism in multi-core nodes. The results show that excellent speedup and performance can be archived on grid systems and clusters, even when using a tightly-coupled algorithm such as DIRECT. Finally, we provide two distinctly different front-ends for our code. One is a grid portal providing a graphical front-end suitable for novice users and standard forms of QTL analysis. The other is a prototype of an R-based grid-enabled problem solving environment. Both of these front-ends can, after some further refinement, be utilized by geneticists for performing multidimensional genetic analysis of quantitative traits on a regular basis.

Ort, förlag, år, upplaga, sidor
Uppsala: Acta Universitatis Upsaliensis, 2010. s. 40
Serie
Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology, ISSN 1651-6214 ; 708
Nyckelord
QTL Analysis, Grid Computing, Global Optimization, e-Science
Nationell ämneskategori
Programvaruteknik Beräkningsmatematik
Forskningsämne
Beräkningsvetenskap
Identifikatorer
urn:nbn:se:uu:diva-111597 (URN)978-91-554-7706-6 (ISBN)
Disputation
2010-02-25, Room 2446, Polacksbacken, Lägerhyddsvägen 2D, Uppsala, 10:15 (Engelska)
Opponent
Handledare
Projekt
eSSENCE
Tillgänglig från: 2010-02-02 Skapad: 2009-12-17 Senast uppdaterad: 2018-01-12Bibliografiskt granskad
2. Managing Applications and Data in Distributed Computing Infrastructures
Öppna denna publikation i ny flik eller fönster >>Managing Applications and Data in Distributed Computing Infrastructures
2012 (Engelska)Doktorsavhandling, sammanläggning (Övrigt vetenskapligt)
Abstract [en]

During the last decades the demand for large-scale computational and storage resources in science has increased dramatically. New computational infrastructures enable scientists to enter a new mode of science, e-science, which complements traditional theory and experiments. E-science is inherently interdisciplinary, involving researchers from several disciplines, and also opens up for large-scale collaborative efforts where physically distributed groups of scientists share software tools and data to make scientific progress. Within the field of e-science, new challenges are emerging in managing large-scale distributed computing efforts and distributed data sets. Different models, e.g. grids and clouds, have been introduced over the years, but new solutions built on these models are needed to enable easy and flexible use of distributed computing infrastructures by application scientists.

In the first part of the thesis, application execution environments are studied. The goal is to hide technical details of the underlying distributed computing infrastructure and expose secure and user-friendly environments to the end users. First, a general-purpose solution using portal technology is described, enabling transparent and easy usage of a variety of grid systems. Then a problem-solving environment for genetic analysis is presented. Here the statistical software R is used as a workflow engine, enhanced with grid-enabled routines for performing the computationally demanding parts of the analysis. Finally, the issue of resource allocation in grid system is briefly studied and certain modifications in the distributed resource-brokering model for the ARC middleware are proposed.

The second part of the thesis presents solutions for managing and analyzing scientific data using distributed storage resources. First, a new reliable and secure file-oriented distributed storage system, Chelonia, is presented. The architectural design of the system is described and implementation issues are considered. Also, the stability and scalable performance of Chelonia is verified using several test scenarios. Then, tools for providing an efficient and easy-to-use platform for data analysis built on Chelonia are presented. Here, a database driven approach is explored. An extended architecture where Chelonia is combined with the Web-Service MEDiator (WSMED) system is implemented, providing web service tools to query data without any further programming. This approach is then developed further and Chelonia is combined with SciSPARQL, a query language that extends SPARQL to queries over numeric scientific data. This results in a system that is capable of interactive analysis of distributed data sets. Writing customized modules in Java, Python or C can fulfill advanced application-specific analysis requirements. The viability of the approach is demonstrated by applying the system to data produced by URDME, a computational environment in systems biology and results for sample queries expressed in SciSPARQL are presented.

Finally, the use of an open source storage cloud, Openstack – SWIFT, for analysis of data from CERN experiments is considered. Here, a pilot implementation for the ROOT data analysis framework is presented together with a performance evaluation.

Ort, förlag, år, upplaga, sidor
Uppsala: Acta Universitatis Upsaliensis, 2012. s. 62
Serie
Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology, ISSN 1651-6214 ; 940
Nyckelord
Distributed Computing Infrastructures, Grids, Clouds, Application Management, Distributed Storage, Resource Allocation
Nationell ämneskategori
Programvaruteknik
Forskningsämne
Beräkningsvetenskap
Identifikatorer
urn:nbn:se:uu:diva-173467 (URN)978-91-554-8381-4 (ISBN)
Disputation
2012-06-14, Room 2446, Polacksbacken, Lägerhyddsvägen 2D, Uppsala, 13:15 (Engelska)
Opponent
Handledare
Projekt
eSSENCE
Tillgänglig från: 2012-05-24 Skapad: 2012-04-25 Senast uppdaterad: 2018-01-12Bibliografiskt granskad
3. Two Optimization Problems in Genetics: Multi-dimensional QTL Analysis and Haplotype Inference
Öppna denna publikation i ny flik eller fönster >>Two Optimization Problems in Genetics: Multi-dimensional QTL Analysis and Haplotype Inference
2012 (Engelska)Doktorsavhandling, sammanläggning (Övrigt vetenskapligt)
Abstract [en]

The existence of new technologies, implemented in efficient platforms and workflows has made massive genotyping available to all fields of biology and medicine. Genetic analyses are no longer dominated by experimental work in laboratories, but rather the interpretation of the resulting data. When billions of data points representing thousands of individuals are available, efficient computational tools are required. The focus of this thesis is on developing models, methods and implementations for such tools.

The first theme of the thesis is multi-dimensional scans for quantitative trait loci (QTL) in experimental crosses. By mating individuals from different lines, it is possible to gather data that can be used to pinpoint the genetic variation that influences specific traits to specific genome loci. However, it is natural to expect multiple genes influencing a single trait to interact. The thesis discusses model structure and model selection, giving new insight regarding under what conditions orthogonal models can be devised. The thesis also presents a new optimization method for efficiently and accurately locating QTL, and performing the permuted data searches needed for significance testing. This method has been implemented in a software package that can seamlessly perform the searches on grid computing infrastructures.

The other theme in the thesis is the development of adapted optimization schemes for using hidden Markov models in tracing allele inheritance pathways, and specifically inferring haplotypes. The advances presented form the basis for more accurate and non-biased line origin probabilities in experimental crosses, especially multi-generational ones. We show that the new tools are able to reconstruct haplotypes and even genotypes in founder individuals and offspring alike, based on only unordered offspring genotypes. The tools can also handle larger populations than competing methods, resolving inheritance pathways and phase in much larger and more complex populations. Finally, the methods presented are also applicable to datasets where individual relationships are not known, which is frequently the case in human genetics studies. One immediate application for this would be improved accuracy for imputation of SNP markers within genome-wide association studies (GWAS).

Ort, förlag, år, upplaga, sidor
Uppsala: Acta Universitatis Upsaliensis, 2012. s. 57
Serie
Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology, ISSN 1651-6214 ; 973
Nyckelord
quantitative trait loci, genome-wide association studies, hidden Markov models, numerical optimization, linkage analysis, haplotype inference, genotype imputation, high performance computing
Nationell ämneskategori
Beräkningsmatematik Sannolikhetsteori och statistik Bioinformatik och systembiologi Genetik Bioinformatik (beräkningsbiologi) Programvaruteknik
Identifikatorer
urn:nbn:se:uu:diva-180920 (URN)978-91-554-8473-6 (ISBN)
Disputation
2012-10-26, Room 2446, Polacksbacken, Lägerhyddsvägen 2D, Uppsala, 13:15 (Engelska)
Opponent
Handledare
Projekt
eSSENCE
Tillgänglig från: 2012-10-04 Skapad: 2012-09-13 Senast uppdaterad: 2018-01-12Bibliografiskt granskad

Open Access i DiVA

Fulltext saknas i DiVA

Person

Jayawardena, MahenNettelblad, CarlToor, SalmanHolmgren, Sverker

Sök vidare i DiVA

Av författaren/redaktören
Jayawardena, MahenNettelblad, CarlToor, SalmanHolmgren, Sverker
Av organisationen
Avdelningen för teknisk databehandlingTillämpad beräkningsvetenskap
ProgramvaruteknikGenetik

Sök vidare utanför DiVA

GoogleGoogle Scholar

isbn
urn-nbn

Altmetricpoäng

isbn
urn-nbn
Totalt: 878 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf