Logo: to the web site of Uppsala University

uu.sePublications from Uppsala University
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Managing Applications and Data in Distributed Computing Infrastructures
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Scientific Computing. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computational Science.
2012 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

During the last decades the demand for large-scale computational and storage resources in science has increased dramatically. New computational infrastructures enable scientists to enter a new mode of science, e-science, which complements traditional theory and experiments. E-science is inherently interdisciplinary, involving researchers from several disciplines, and also opens up for large-scale collaborative efforts where physically distributed groups of scientists share software tools and data to make scientific progress. Within the field of e-science, new challenges are emerging in managing large-scale distributed computing efforts and distributed data sets. Different models, e.g. grids and clouds, have been introduced over the years, but new solutions built on these models are needed to enable easy and flexible use of distributed computing infrastructures by application scientists.

In the first part of the thesis, application execution environments are studied. The goal is to hide technical details of the underlying distributed computing infrastructure and expose secure and user-friendly environments to the end users. First, a general-purpose solution using portal technology is described, enabling transparent and easy usage of a variety of grid systems. Then a problem-solving environment for genetic analysis is presented. Here the statistical software R is used as a workflow engine, enhanced with grid-enabled routines for performing the computationally demanding parts of the analysis. Finally, the issue of resource allocation in grid system is briefly studied and certain modifications in the distributed resource-brokering model for the ARC middleware are proposed.

The second part of the thesis presents solutions for managing and analyzing scientific data using distributed storage resources. First, a new reliable and secure file-oriented distributed storage system, Chelonia, is presented. The architectural design of the system is described and implementation issues are considered. Also, the stability and scalable performance of Chelonia is verified using several test scenarios. Then, tools for providing an efficient and easy-to-use platform for data analysis built on Chelonia are presented. Here, a database driven approach is explored. An extended architecture where Chelonia is combined with the Web-Service MEDiator (WSMED) system is implemented, providing web service tools to query data without any further programming. This approach is then developed further and Chelonia is combined with SciSPARQL, a query language that extends SPARQL to queries over numeric scientific data. This results in a system that is capable of interactive analysis of distributed data sets. Writing customized modules in Java, Python or C can fulfill advanced application-specific analysis requirements. The viability of the approach is demonstrated by applying the system to data produced by URDME, a computational environment in systems biology and results for sample queries expressed in SciSPARQL are presented.

Finally, the use of an open source storage cloud, Openstack – SWIFT, for analysis of data from CERN experiments is considered. Here, a pilot implementation for the ROOT data analysis framework is presented together with a performance evaluation.

Place, publisher, year, edition, pages
Uppsala: Acta Universitatis Upsaliensis, 2012. , p. 62
Series
Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology, ISSN 1651-6214 ; 940
Keywords [en]
Distributed Computing Infrastructures, Grids, Clouds, Application Management, Distributed Storage, Resource Allocation
National Category
Software Engineering
Research subject
Scientific Computing
Identifiers
URN: urn:nbn:se:uu:diva-173467ISBN: 978-91-554-8381-4 (print)OAI: oai:DiVA.org:uu-173467DiVA, id: diva2:523474
Public defence
2012-06-14, Room 2446, Polacksbacken, Lägerhyddsvägen 2D, Uppsala, 13:15 (English)
Opponent
Supervisors
Projects
eSSENCEAvailable from: 2012-05-24 Created: 2012-04-25 Last updated: 2018-01-12Bibliographically approved
List of papers
1. Empowering a flexible application portal with a SOA-based grid job management framework
Open this publication in new window or tab >>Empowering a flexible application portal with a SOA-based grid job management framework
Show others...
2008 (English)Conference paper, Published paper (Refereed)
Place, publisher, year, edition, pages
Trondheim, Norway: Norwegian University of Science and Technology, 2008
National Category
Software Engineering
Identifiers
urn:nbn:se:uu:diva-173381 (URN)
Conference
PARA 2008: State of the Art in Scientific and Parallel Computing
Available from: 2012-04-23 Created: 2012-04-23 Last updated: 2018-01-12Bibliographically approved
2. A Grid-Enabled Problem Solving Environment for QTL Analysis in R
Open this publication in new window or tab >>A Grid-Enabled Problem Solving Environment for QTL Analysis in R
Show others...
2010 (English)In: Proc. 2nd International Conference on Bioinformatics and Computational Biology, Cary, NC: ISCA , 2010, p. 202-209Conference paper, Published paper (Refereed)
Place, publisher, year, edition, pages
Cary, NC: ISCA, 2010
National Category
Software Engineering Genetics
Identifiers
urn:nbn:se:uu:diva-111594 (URN)978-1-880843-76-5 (ISBN)
Projects
eSSENCE
Available from: 2010-01-12 Created: 2009-12-17 Last updated: 2018-01-12Bibliographically approved
3. Chelonia — a self-healing storage cloud
Open this publication in new window or tab >>Chelonia — a self-healing storage cloud
2010 (English)In: Proc. 9th Cracow Grid Workshop, Kraków, Poland: ACC Cyfronet AGH , 2010, p. 5-12Conference paper, Published paper (Refereed)
Place, publisher, year, edition, pages
Kraków, Poland: ACC Cyfronet AGH, 2010
National Category
Software Engineering
Identifiers
urn:nbn:se:uu:diva-129188 (URN)978-83-61433-01-9 (ISBN)
Projects
eSSENCE
Available from: 2010-03-01 Created: 2010-08-06 Last updated: 2018-01-12Bibliographically approved
4. Chelonia: A self-healing, replicated storage system
Open this publication in new window or tab >>Chelonia: A self-healing, replicated storage system
2011 (English)In: Computing in High Energy and Nuclear Physics: CHEP 2010, Bristol, UK: Institute of Physics Publishing (IOPP), 2011, p. 062019:1-6Conference paper, Published paper (Refereed)
Place, publisher, year, edition, pages
Bristol, UK: Institute of Physics Publishing (IOPP), 2011
Series
Journal of Physics: Conference Series ; 331
National Category
Software Engineering
Identifiers
urn:nbn:se:uu:diva-173392 (URN)10.1088/1742-6596/331/6/062019 (DOI)000301120500019 ()
Projects
eSSENCE
Available from: 2011-12-23 Created: 2012-04-23 Last updated: 2018-01-12Bibliographically approved
5. Performance and stability of the Chelonia storage system
Open this publication in new window or tab >>Performance and stability of the Chelonia storage system
Show others...
2012 (English)In: Proc. International Symposium on Grids and Clouds 2012, Trieste, Italy: SISSA , 2012, p. 009:1-14Conference paper, Published paper (Refereed)
Place, publisher, year, edition, pages
Trieste, Italy: SISSA, 2012
Series
Proceedings of Science, ISSN 1824-8039 ; 153
National Category
Software Engineering
Identifiers
urn:nbn:se:uu:diva-129189 (URN)
Conference
ISGC 2012
Projects
eSSENCE
Available from: 2012-03-01 Created: 2010-08-06 Last updated: 2018-01-12Bibliographically approved
6. A scalable architecture for e-Science data management
Open this publication in new window or tab >>A scalable architecture for e-Science data management
2011 (English)In: Proc. 7th International Conference on e-Science, Los Alamitos, CA: IEEE Computer Society, 2011, p. 210-217Conference paper, Published paper (Refereed)
Place, publisher, year, edition, pages
Los Alamitos, CA: IEEE Computer Society, 2011
National Category
Computer Sciences
Identifiers
urn:nbn:se:uu:diva-166258 (URN)10.1109/eScience.2011.37 (DOI)978-1-4577-2163-2 (ISBN)
Projects
eSSENCE
Available from: 2012-01-09 Created: 2012-01-11 Last updated: 2018-01-12Bibliographically approved
7. Scientific analysis by queries in extended SPARQL over a scalable e-Science data store
Open this publication in new window or tab >>Scientific analysis by queries in extended SPARQL over a scalable e-Science data store
Show others...
2013 (English)In: Proc. 9th International Conference on e-Science, Los Alamitos, CA: IEEE Computer Society, 2013, p. 98-106Conference paper, Published paper (Refereed)
Place, publisher, year, edition, pages
Los Alamitos, CA: IEEE Computer Society, 2013
National Category
Computer Sciences
Identifiers
urn:nbn:se:uu:diva-173448 (URN)10.1109/eScience.2013.19 (DOI)000330195500012 ()978-0-7695-5083-1 (ISBN)
Projects
eSSENCE
Available from: 2013-10-25 Created: 2012-04-24 Last updated: 2018-01-12Bibliographically approved
8. Investigating an open source cloud storage infrastructure for CERN-specific data analysis
Open this publication in new window or tab >>Investigating an open source cloud storage infrastructure for CERN-specific data analysis
2012 (English)In: Proc. 7th International Conference on Networking, Architecture, and Storage, Los Alamitos, CA: IEEE Computer Society, 2012, p. 84-88Conference paper, Published paper (Refereed)
Place, publisher, year, edition, pages
Los Alamitos, CA: IEEE Computer Society, 2012
National Category
Software Engineering
Identifiers
urn:nbn:se:uu:diva-173388 (URN)10.1109/NAS.2012.14 (DOI)978-1-4673-1889-1 (ISBN)
Conference
NAS 2012
Projects
eSSENCE
Available from: 2012-09-24 Created: 2012-04-23 Last updated: 2018-01-12Bibliographically approved

Open Access in DiVA

fulltext(3921 kB)2713 downloads
File information
File name FULLTEXT01.pdfFile size 3921 kBChecksum SHA-512
ea1c8f863e4c505fc876ed8c4f48af8f19997862f8ae2bd8a8b296f900f5fa3c48780cf0c892ded2a190f0e84e26bb88e8758f3bfdce99ef85f1a8d9b8d12f19
Type fulltextMimetype application/pdf

Authority records

Toor, Salman Zubair

Search in DiVA

By author/editor
Toor, Salman Zubair
By organisation
Division of Scientific ComputingComputational Science
Software Engineering

Search outside of DiVA

GoogleGoogle Scholar
Total: 2714 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 2513 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf