uu.seUppsala University Publications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Investigating an open source cloud storage infrastructure for CERN-specific data analysis
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Scientific Computing. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computational Science.
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Scientific Computing. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computational Science.
2012 (English)In: Proc. 7th International Conference on Networking, Architecture, and Storage, Los Alamitos, CA: IEEE Computer Society, 2012, 84-88 p.Conference paper, Published paper (Refereed)
Place, publisher, year, edition, pages
Los Alamitos, CA: IEEE Computer Society, 2012. 84-88 p.
National Category
Software Engineering
Identifiers
URN: urn:nbn:se:uu:diva-173388DOI: 10.1109/NAS.2012.14ISBN: 978-1-4673-1889-1 (print)OAI: oai:DiVA.org:uu-173388DiVA: diva2:517327
Conference
NAS 2012
Projects
eSSENCE
Available from: 2012-09-24 Created: 2012-04-23 Last updated: 2012-09-28Bibliographically approved
In thesis
1. Managing Applications and Data in Distributed Computing Infrastructures
Open this publication in new window or tab >>Managing Applications and Data in Distributed Computing Infrastructures
2012 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

During the last decades the demand for large-scale computational and storage resources in science has increased dramatically. New computational infrastructures enable scientists to enter a new mode of science, e-science, which complements traditional theory and experiments. E-science is inherently interdisciplinary, involving researchers from several disciplines, and also opens up for large-scale collaborative efforts where physically distributed groups of scientists share software tools and data to make scientific progress. Within the field of e-science, new challenges are emerging in managing large-scale distributed computing efforts and distributed data sets. Different models, e.g. grids and clouds, have been introduced over the years, but new solutions built on these models are needed to enable easy and flexible use of distributed computing infrastructures by application scientists.

In the first part of the thesis, application execution environments are studied. The goal is to hide technical details of the underlying distributed computing infrastructure and expose secure and user-friendly environments to the end users. First, a general-purpose solution using portal technology is described, enabling transparent and easy usage of a variety of grid systems. Then a problem-solving environment for genetic analysis is presented. Here the statistical software R is used as a workflow engine, enhanced with grid-enabled routines for performing the computationally demanding parts of the analysis. Finally, the issue of resource allocation in grid system is briefly studied and certain modifications in the distributed resource-brokering model for the ARC middleware are proposed.

The second part of the thesis presents solutions for managing and analyzing scientific data using distributed storage resources. First, a new reliable and secure file-oriented distributed storage system, Chelonia, is presented. The architectural design of the system is described and implementation issues are considered. Also, the stability and scalable performance of Chelonia is verified using several test scenarios. Then, tools for providing an efficient and easy-to-use platform for data analysis built on Chelonia are presented. Here, a database driven approach is explored. An extended architecture where Chelonia is combined with the Web-Service MEDiator (WSMED) system is implemented, providing web service tools to query data without any further programming. This approach is then developed further and Chelonia is combined with SciSPARQL, a query language that extends SPARQL to queries over numeric scientific data. This results in a system that is capable of interactive analysis of distributed data sets. Writing customized modules in Java, Python or C can fulfill advanced application-specific analysis requirements. The viability of the approach is demonstrated by applying the system to data produced by URDME, a computational environment in systems biology and results for sample queries expressed in SciSPARQL are presented.

Finally, the use of an open source storage cloud, Openstack – SWIFT, for analysis of data from CERN experiments is considered. Here, a pilot implementation for the ROOT data analysis framework is presented together with a performance evaluation.

Place, publisher, year, edition, pages
Uppsala: Acta Universitatis Upsaliensis, 2012. 62 p.
Series
Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology, ISSN 1651-6214 ; 940
Keyword
Distributed Computing Infrastructures, Grids, Clouds, Application Management, Distributed Storage, Resource Allocation
National Category
Software Engineering
Research subject
Scientific Computing
Identifiers
urn:nbn:se:uu:diva-173467 (URN)978-91-554-8381-4 (ISBN)
Public defence
2012-06-14, Room 2446, Polacksbacken, Lägerhyddsvägen 2D, Uppsala, 13:15 (English)
Opponent
Supervisors
Projects
eSSENCE
Available from: 2012-05-24 Created: 2012-04-25 Last updated: 2012-08-01Bibliographically approved

Open Access in DiVA

No full text

Other links

Publisher's full text

Authority records BETA

Toor, SalmanHolmgren, Sverker

Search in DiVA

By author/editor
Toor, SalmanHolmgren, Sverker
By organisation
Division of Scientific ComputingComputational Science
Software Engineering

Search outside of DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 841 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf