uu.seUppsala universitets publikationer
Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Using hardware transactional memory for high-performance computing
Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Matematisk-datavetenskapliga sektionen, Institutionen för informationsteknologi, Avdelningen för teknisk databehandling. Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Matematisk-datavetenskapliga sektionen, Institutionen för informationsteknologi, Tillämpad beräkningsvetenskap.
Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Matematisk-datavetenskapliga sektionen, Institutionen för informationsteknologi, Avdelningen för teknisk databehandling. Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Matematisk-datavetenskapliga sektionen, Institutionen för informationsteknologi, Tillämpad beräkningsvetenskap.
Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Matematisk-datavetenskapliga sektionen, Institutionen för informationsteknologi, Datorteknik. (Uppsala Architecture Research Team)
Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Matematisk-datavetenskapliga sektionen, Institutionen för informationsteknologi, Avdelningen för teknisk databehandling. Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Matematisk-datavetenskapliga sektionen, Institutionen för informationsteknologi, Tillämpad beräkningsvetenskap.
Visa övriga samt affilieringar
2011 (Engelska)Ingår i: Proc. 25th International Symposium on Parallel and Distributed Processing Workshops and PhD Forum, Piscataway, NJ: IEEE , 2011, s. 1660-1667Konferensbidrag, Publicerat paper (Refereegranskat)
Ort, förlag, år, upplaga, sidor
Piscataway, NJ: IEEE , 2011. s. 1660-1667
Nationell ämneskategori
Datavetenskap (datalogi)
Identifikatorer
URN: urn:nbn:se:uu:diva-158551DOI: 10.1109/IPDPS.2011.322ISBN: 978-1-61284-425-1 (tryckt)OAI: oai:DiVA.org:uu-158551DiVA, id: diva2:440014
Konferens
IPDPS Workshop on Multi-Threaded Architectures and Applications
Projekt
eSSENCEUPMARCTillgänglig från: 2011-09-01 Skapad: 2011-09-10 Senast uppdaterad: 2018-01-12Bibliografiskt granskad
Ingår i avhandling
1. Leveraging multicore processors for scientific computing
Öppna denna publikation i ny flik eller fönster >>Leveraging multicore processors for scientific computing
2012 (Engelska)Licentiatavhandling, sammanläggning (Övrigt vetenskapligt)
Abstract [en]

This thesis deals with how to develop scientific computing software that runs efficiently on multicore processors. The goal is to find building blocks and programming models that increase the productivity and reduce the probability of programming errors when developing parallel software.

In our search for new building blocks, we evaluate the use of hardware transactional memory for constructing atomic floating point operations. Using benchmark applications from scientific computing, we show in which situations this achieves better performance than other approaches.

Driven by the needs of scientific computing applications, we develop a programming model and implement it as a reusable library. The library provides a run-time system for executing tasks on multicore architectures, with efficient and user-friendly management of dependencies. Our results from scientific computing benchmarks show excellent scaling up to at least 64 cores. We also investigate how the execution time depend on the task granularity, and build a model for the performance of the task library.

Ort, förlag, år, upplaga, sidor
Uppsala University, 2012
Serie
IT licentiate theses / Uppsala University, Department of Information Technology, ISSN 1404-5117 ; 2012-006
Nationell ämneskategori
Programvaruteknik Beräkningsmatematik
Forskningsämne
Beräkningsvetenskap
Identifikatorer
urn:nbn:se:uu:diva-181266 (URN)
Handledare
Projekt
UPMARCeSSENCE
Tillgänglig från: 2012-09-28 Skapad: 2012-09-20 Senast uppdaterad: 2018-01-12Bibliografiskt granskad
2. Scientific Computing on Multicore Architectures
Öppna denna publikation i ny flik eller fönster >>Scientific Computing on Multicore Architectures
2014 (Engelska)Doktorsavhandling, sammanläggning (Övrigt vetenskapligt)
Abstract [en]

Computer simulations are an indispensable tool for scientists to gain new insights about nature. Simulations of natural phenomena are usually large, and limited by the available computer resources. By using the computer resources more efficiently, larger and more detailed simulations can be performed, and more information can be extracted to help advance human knowledge.

The topic of this thesis is how to make best use of modern computers for scientific computations. The challenge here is the high level of parallelism that is required to fully utilize the multicore processors in these systems.

Starting from the basics, the primitives for synchronizing between threads are investigated. Hardware transactional memory is a new construct for this, which is evaluated for a new use of importance for scientific software: atomic updates of floating point values. The evaluation includes experiments on real hardware and comparisons against standard methods.

Higher level programming models for shared memory parallelism are then considered. The state of the art for efficient use of multicore systems is dynamically scheduled task-based systems, where tasks can depend on data. In such systems, the software is divided up into many small tasks that are scheduled asynchronously according to their data dependencies. This enables a high level of parallelism, and avoids global barriers.

A new system for managing task dependencies is developed in this thesis, based on data versioning. The system is implemented as a reusable software library, and shown to be as efficient or more efficient than other shared-memory task-based systems in experimental comparisons.

The developed runtime system is then extended to distributed memory machines, and used for implementing a parallel version of a software for global climate simulations. By running the optimized and parallelized version on eight servers, an equally sized problem can be solved over 100 times faster than in the original sequential version. The parallel version also allowed significantly larger problems to be solved, previously unreachable due to memory constraints.

Ort, förlag, år, upplaga, sidor
Uppsala: Acta Universitatis Upsaliensis, 2014. s. 47
Serie
Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology, ISSN 1651-6214 ; 1139
Nyckelord
multicore, scientific computing, shared memory parallelism, task-based programming, parallel programming model, task scheduling, data versioning
Nationell ämneskategori
Programvaruteknik Beräkningsmatematik
Forskningsämne
Beräkningsvetenskap
Identifikatorer
urn:nbn:se:uu:diva-221241 (URN)978-91-554-8928-1 (ISBN)
Disputation
2014-05-23, Room 2446, Polacksbacken, Lägerhyddsvägen 2, Uppsala, 10:15 (Engelska)
Opponent
Handledare
Projekt
UPMARCeSSENCE
Tillgänglig från: 2014-04-29 Skapad: 2014-03-26 Senast uppdaterad: 2018-01-11Bibliografiskt granskad
3. Techniques for finite element methods on modern processors
Öppna denna publikation i ny flik eller fönster >>Techniques for finite element methods on modern processors
2015 (Engelska)Licentiatavhandling, sammanläggning (Övrigt vetenskapligt)
Abstract [en]

In this thesis, methods for efficient utilization of modern computer hardware for numerical simulation are considered. In particular, we study techniques for speeding up the execution of finite-element methods.

One of the greatest challenges in finite-element computation is how to efficiently perform the the system matrix assembly efficiently in parallel, due to its complicated memory access pattern. The main difficulty lies in the fact that many entries of the matrix are being updated concurrently by several parallel threads. We consider transactional memory, an exotic hardware feature for concurrent update of shared variables, and conduct benchmarks on a prototype processor supporting it. Our experiments show that transactions can both simplify programming and provide good performance for concurrent updates of floating point data.

Furthermore, we study a matrix-free approach to finite-element computation which avoids the matrix assembly. Motivated by its computational properties, we implement the matrix-free method for execution on graphics processors, using either atomic updates or a mesh coloring approach to handle the concurrent updates. A performance study shows that on the GPU, the matrix-free method is faster than a matrix-based implementation for many element types, and allows for solution of considerably larger problems. This suggests that the matrix-free method can speed up execution of large realistic simulations.

Ort, förlag, år, upplaga, sidor
Uppsala University, 2015
Serie
IT licentiate theses / Uppsala University, Department of Information Technology, ISSN 1404-5117 ; 2015-001
Nationell ämneskategori
Datavetenskap (datalogi) Beräkningsmatematik
Forskningsämne
Beräkningsvetenskap
Identifikatorer
urn:nbn:se:uu:diva-242186 (URN)
Handledare
Projekt
UPMARCeSSENCE
Tillgänglig från: 2015-01-18 Skapad: 2015-01-22 Senast uppdaterad: 2018-01-11Bibliografiskt granskad
4. Finite Element Computations on Multicore and Graphics Processors
Öppna denna publikation i ny flik eller fönster >>Finite Element Computations on Multicore and Graphics Processors
2017 (Engelska)Doktorsavhandling, sammanläggning (Övrigt vetenskapligt)
Abstract [en]

In this thesis, techniques for efficient utilization of modern computer hardwarefor numerical simulation are considered. In particular, we study techniques for improving the performance of computations using the finite element method.

One of the main difficulties in finite-element computations is how to perform the assembly of the system matrix efficiently in parallel, due to its complicated memory access pattern. The challenge lies in the fact that many entries of the matrix are being updated concurrently by several parallel threads. We consider transactional memory, an exotic hardware feature for concurrent update of shared variables, and conduct benchmarks on a prototype multicore processor supporting it. Our experiments show that transactions can both simplify programming and provide good performance for concurrent updates of floating point data.

Secondly, we study a matrix-free approach to finite-element computation which avoids the matrix assembly. In addition to removing the need to store the system matrix, matrix-free methods are attractive due to their low memory footprint and therefore better match the architecture of modern processors where memory bandwidth is scarce and compute power is abundant. Motivated by this, we consider matrix-free implementations of high-order finite-element methods for execution on graphics processors, which have seen a revolutionary increase in usage for numerical computations during recent years due to their more efficient architecture. In the implementation, we exploit sum-factorization techniques for efficient evaluation of matrix-vector products, mesh coloring and atomic updates for concurrent updates, and a geometric multigrid algorithm for efficient preconditioning of iterative solvers. Our performance studies show that on the GPU, a matrix-free approach is the method of choice for elements of order two and higher, yielding both a significantly faster execution, and allowing for solution of considerably larger problems. Compared to corresponding CPU implementations executed on comparable multicore processors, the GPU implementation is about twice as fast, suggesting that graphics processors are about twice as power efficient as multicores for computations of this kind.

Ort, förlag, år, upplaga, sidor
Uppsala: Acta Universitatis Upsaliensis, 2017. s. 64
Serie
Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology, ISSN 1651-6214 ; 1512
Nyckelord
Finite Element Methods, GPU, Matrix-Free, Multigrid, Transactional Memory
Nationell ämneskategori
Datavetenskap (datalogi) Beräkningsmatematik
Forskningsämne
Beräkningsvetenskap
Identifikatorer
urn:nbn:se:uu:diva-320147 (URN)978-91-554-9907-5 (ISBN)
Disputation
2017-06-09, ITC 2446, Lägerhyddsvägen 2, Uppsala, 10:15 (Engelska)
Opponent
Handledare
Projekt
UPMARC
Tillgänglig från: 2017-05-16 Skapad: 2017-04-17 Senast uppdaterad: 2019-02-25

Open Access i DiVA

Fulltext saknas i DiVA

Övriga länkar

Förlagets fulltext

Personposter BETA

Ljungkvist, KarlTillenius, MartinBlack-Schaffer, DavidHolmgren, SverkerLarsson, Elisabeth

Sök vidare i DiVA

Av författaren/redaktören
Ljungkvist, KarlTillenius, MartinBlack-Schaffer, DavidHolmgren, SverkerLarsson, Elisabeth
Av organisationen
Avdelningen för teknisk databehandlingTillämpad beräkningsvetenskapDatorteknik
Datavetenskap (datalogi)

Sök vidare utanför DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetricpoäng

doi
isbn
urn-nbn
Totalt: 715 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf