uu.seUppsala University Publications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
A Grid-Enabled Problem Solving Environment for QTL Analysis in R
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Scientific Computing. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computational Science.
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Scientific Computing. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computational Science.
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Scientific Computing. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computational Science.
Show others and affiliations
2010 (English)In: Proc. 2nd International Conference on Bioinformatics and Computational Biology, Cary, NC: ISCA , 2010, 202-209 p.Conference paper, Published paper (Refereed)
Place, publisher, year, edition, pages
Cary, NC: ISCA , 2010. 202-209 p.
National Category
Software Engineering Genetics
Identifiers
URN: urn:nbn:se:uu:diva-111594ISBN: 978-1-880843-76-5 (print)OAI: oai:DiVA.org:uu-111594DiVA: diva2:285725
Projects
eSSENCE
Available from: 2010-01-12 Created: 2009-12-17 Last updated: 2018-01-12Bibliographically approved
In thesis
1. An e-Science Approach to Genetic Analysis of Quantitative Traits
Open this publication in new window or tab >>An e-Science Approach to Genetic Analysis of Quantitative Traits
2010 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Many important traits in plants, animals and humans are quantitative, and most such traits are generally believed to be affected by multiple genetic loci. Standard computational tools for mapping of quantitative traits (i.e. for finding Quantitative Trait Loci, QTL, in the genome) use linear regression models for relating the observed phenotypes to the genetic composition of individuals in an experimental population. Using these tools to simultaneously search for multiple QTL is computationally demanding. The main reason for this is the complex optimization landscape for the multidimensional global optimization problems that must be solved. This thesis describes parallel algorithms, implementations and tools for simultaneous mapping of several QTL. These new computational tools enable genetic analysis exploiting new classes of multidimensional statistical models, potentially resulting in interesting results in genetics.

We first describe how the standard, brute-force algorithm for global optimization in QTL analysis is parallelized and implemented on a grid system. Then, we also present a parallelized version of the more elaborate global optimization algorithm DIRECT and show how this can be efficiently deployed and used on grid systems and other loosely-coupled architectures. The parallel DIRECT scheme is further developed to exploit both coarse-grained parallelism in grid systems or clusters as well as fine-grained, tightly-coupled parallelism in multi-core nodes. The results show that excellent speedup and performance can be archived on grid systems and clusters, even when using a tightly-coupled algorithm such as DIRECT. Finally, we provide two distinctly different front-ends for our code. One is a grid portal providing a graphical front-end suitable for novice users and standard forms of QTL analysis. The other is a prototype of an R-based grid-enabled problem solving environment. Both of these front-ends can, after some further refinement, be utilized by geneticists for performing multidimensional genetic analysis of quantitative traits on a regular basis.

Place, publisher, year, edition, pages
Uppsala: Acta Universitatis Upsaliensis, 2010. 40 p.
Series
Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology, ISSN 1651-6214 ; 708
Keyword
QTL Analysis, Grid Computing, Global Optimization, e-Science
National Category
Software Engineering Computational Mathematics
Research subject
Scientific Computing
Identifiers
urn:nbn:se:uu:diva-111597 (URN)978-91-554-7706-6 (ISBN)
Public defence
2010-02-25, Room 2446, Polacksbacken, Lägerhyddsvägen 2D, Uppsala, 10:15 (English)
Opponent
Supervisors
Projects
eSSENCE
Available from: 2010-02-02 Created: 2009-12-17 Last updated: 2018-01-12Bibliographically approved
2. Managing Applications and Data in Distributed Computing Infrastructures
Open this publication in new window or tab >>Managing Applications and Data in Distributed Computing Infrastructures
2012 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

During the last decades the demand for large-scale computational and storage resources in science has increased dramatically. New computational infrastructures enable scientists to enter a new mode of science, e-science, which complements traditional theory and experiments. E-science is inherently interdisciplinary, involving researchers from several disciplines, and also opens up for large-scale collaborative efforts where physically distributed groups of scientists share software tools and data to make scientific progress. Within the field of e-science, new challenges are emerging in managing large-scale distributed computing efforts and distributed data sets. Different models, e.g. grids and clouds, have been introduced over the years, but new solutions built on these models are needed to enable easy and flexible use of distributed computing infrastructures by application scientists.

In the first part of the thesis, application execution environments are studied. The goal is to hide technical details of the underlying distributed computing infrastructure and expose secure and user-friendly environments to the end users. First, a general-purpose solution using portal technology is described, enabling transparent and easy usage of a variety of grid systems. Then a problem-solving environment for genetic analysis is presented. Here the statistical software R is used as a workflow engine, enhanced with grid-enabled routines for performing the computationally demanding parts of the analysis. Finally, the issue of resource allocation in grid system is briefly studied and certain modifications in the distributed resource-brokering model for the ARC middleware are proposed.

The second part of the thesis presents solutions for managing and analyzing scientific data using distributed storage resources. First, a new reliable and secure file-oriented distributed storage system, Chelonia, is presented. The architectural design of the system is described and implementation issues are considered. Also, the stability and scalable performance of Chelonia is verified using several test scenarios. Then, tools for providing an efficient and easy-to-use platform for data analysis built on Chelonia are presented. Here, a database driven approach is explored. An extended architecture where Chelonia is combined with the Web-Service MEDiator (WSMED) system is implemented, providing web service tools to query data without any further programming. This approach is then developed further and Chelonia is combined with SciSPARQL, a query language that extends SPARQL to queries over numeric scientific data. This results in a system that is capable of interactive analysis of distributed data sets. Writing customized modules in Java, Python or C can fulfill advanced application-specific analysis requirements. The viability of the approach is demonstrated by applying the system to data produced by URDME, a computational environment in systems biology and results for sample queries expressed in SciSPARQL are presented.

Finally, the use of an open source storage cloud, Openstack – SWIFT, for analysis of data from CERN experiments is considered. Here, a pilot implementation for the ROOT data analysis framework is presented together with a performance evaluation.

Place, publisher, year, edition, pages
Uppsala: Acta Universitatis Upsaliensis, 2012. 62 p.
Series
Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology, ISSN 1651-6214 ; 940
Keyword
Distributed Computing Infrastructures, Grids, Clouds, Application Management, Distributed Storage, Resource Allocation
National Category
Software Engineering
Research subject
Scientific Computing
Identifiers
urn:nbn:se:uu:diva-173467 (URN)978-91-554-8381-4 (ISBN)
Public defence
2012-06-14, Room 2446, Polacksbacken, Lägerhyddsvägen 2D, Uppsala, 13:15 (English)
Opponent
Supervisors
Projects
eSSENCE
Available from: 2012-05-24 Created: 2012-04-25 Last updated: 2018-01-12Bibliographically approved
3. Two Optimization Problems in Genetics: Multi-dimensional QTL Analysis and Haplotype Inference
Open this publication in new window or tab >>Two Optimization Problems in Genetics: Multi-dimensional QTL Analysis and Haplotype Inference
2012 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

The existence of new technologies, implemented in efficient platforms and workflows has made massive genotyping available to all fields of biology and medicine. Genetic analyses are no longer dominated by experimental work in laboratories, but rather the interpretation of the resulting data. When billions of data points representing thousands of individuals are available, efficient computational tools are required. The focus of this thesis is on developing models, methods and implementations for such tools.

The first theme of the thesis is multi-dimensional scans for quantitative trait loci (QTL) in experimental crosses. By mating individuals from different lines, it is possible to gather data that can be used to pinpoint the genetic variation that influences specific traits to specific genome loci. However, it is natural to expect multiple genes influencing a single trait to interact. The thesis discusses model structure and model selection, giving new insight regarding under what conditions orthogonal models can be devised. The thesis also presents a new optimization method for efficiently and accurately locating QTL, and performing the permuted data searches needed for significance testing. This method has been implemented in a software package that can seamlessly perform the searches on grid computing infrastructures.

The other theme in the thesis is the development of adapted optimization schemes for using hidden Markov models in tracing allele inheritance pathways, and specifically inferring haplotypes. The advances presented form the basis for more accurate and non-biased line origin probabilities in experimental crosses, especially multi-generational ones. We show that the new tools are able to reconstruct haplotypes and even genotypes in founder individuals and offspring alike, based on only unordered offspring genotypes. The tools can also handle larger populations than competing methods, resolving inheritance pathways and phase in much larger and more complex populations. Finally, the methods presented are also applicable to datasets where individual relationships are not known, which is frequently the case in human genetics studies. One immediate application for this would be improved accuracy for imputation of SNP markers within genome-wide association studies (GWAS).

Place, publisher, year, edition, pages
Uppsala: Acta Universitatis Upsaliensis, 2012. 57 p.
Series
Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology, ISSN 1651-6214 ; 973
Keyword
quantitative trait loci, genome-wide association studies, hidden Markov models, numerical optimization, linkage analysis, haplotype inference, genotype imputation, high performance computing
National Category
Computational Mathematics Probability Theory and Statistics Bioinformatics and Systems Biology Genetics Bioinformatics (Computational Biology) Software Engineering
Identifiers
urn:nbn:se:uu:diva-180920 (URN)978-91-554-8473-6 (ISBN)
Public defence
2012-10-26, Room 2446, Polacksbacken, Lägerhyddsvägen 2D, Uppsala, 13:15 (English)
Opponent
Supervisors
Projects
eSSENCE
Available from: 2012-10-04 Created: 2012-09-13 Last updated: 2018-01-12Bibliographically approved

Open Access in DiVA

No full text

Authority records BETA

Jayawardena, MahenNettelblad, CarlToor, SalmanHolmgren, Sverker

Search in DiVA

By author/editor
Jayawardena, MahenNettelblad, CarlToor, SalmanHolmgren, Sverker
By organisation
Division of Scientific ComputingComputational Science
Software EngineeringGenetics

Search outside of DiVA

GoogleGoogle Scholar

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 753 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf