uu.seUppsala University Publications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Matrix-free finite-element operator application on graphics processing units
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Scientific Computing. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computational Science.
2014 (English)In: Euro-Par 2014: Parallel Processing Workshops, Part II, Springer, 2014, 450-461 p.Conference paper, Published paper (Refereed)
Place, publisher, year, edition, pages
Springer, 2014. 450-461 p.
Series
Lecture Notes in Computer Science, 8806
National Category
Computer Science Computational Mathematics
Identifiers
URN: urn:nbn:se:uu:diva-238380DOI: 10.1007/978-3-319-14313-2_38ISI: 000354785000038ISBN: 978-3-319-14312-5 (print)OAI: oai:DiVA.org:uu-238380DiVA: diva2:770982
Conference
7th Workshop on Unconventional High-Performance Computing
Projects
UPMARCeSSENCE
Available from: 2014-12-11 Created: 2014-12-11 Last updated: 2017-04-17Bibliographically approved
In thesis
1. Techniques for finite element methods on modern processors
Open this publication in new window or tab >>Techniques for finite element methods on modern processors
2015 (English)Licentiate thesis, comprehensive summary (Other academic)
Abstract [en]

In this thesis, methods for efficient utilization of modern computer hardware for numerical simulation are considered. In particular, we study techniques for speeding up the execution of finite-element methods.

One of the greatest challenges in finite-element computation is how to efficiently perform the the system matrix assembly efficiently in parallel, due to its complicated memory access pattern. The main difficulty lies in the fact that many entries of the matrix are being updated concurrently by several parallel threads. We consider transactional memory, an exotic hardware feature for concurrent update of shared variables, and conduct benchmarks on a prototype processor supporting it. Our experiments show that transactions can both simplify programming and provide good performance for concurrent updates of floating point data.

Furthermore, we study a matrix-free approach to finite-element computation which avoids the matrix assembly. Motivated by its computational properties, we implement the matrix-free method for execution on graphics processors, using either atomic updates or a mesh coloring approach to handle the concurrent updates. A performance study shows that on the GPU, the matrix-free method is faster than a matrix-based implementation for many element types, and allows for solution of considerably larger problems. This suggests that the matrix-free method can speed up execution of large realistic simulations.

Place, publisher, year, edition, pages
Uppsala University, 2015
Series
Information technology licentiate theses: Licentiate theses from the Department of Information Technology, ISSN 1404-5117 ; 2015-001
National Category
Computer Science Computational Mathematics
Research subject
Scientific Computing
Identifiers
urn:nbn:se:uu:diva-242186 (URN)
Supervisors
Projects
UPMARCeSSENCE
Available from: 2015-01-18 Created: 2015-01-22 Last updated: 2017-08-31Bibliographically approved
2. Finite Element Computations on Multicore and Graphics Processors
Open this publication in new window or tab >>Finite Element Computations on Multicore and Graphics Processors
2017 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

In this thesis, techniques for efficient utilization of modern computer hardwarefor numerical simulation are considered. In particular, we study techniques for improving the performance of computations using the finite element method.

One of the main difficulties in finite-element computations is how to perform the assembly of the system matrix efficiently in parallel, due to its complicated memory access pattern. The challenge lies in the fact that many entries of the matrix are being updated concurrently by several parallel threads. We consider transactional memory, an exotic hardware feature for concurrent update of shared variables, and conduct benchmarks on a prototype multicore processor supporting it. Our experiments show that transactions can both simplify programming and provide good performance for concurrent updates of floating point data.

Secondly, we study a matrix-free approach to finite-element computation which avoids the matrix assembly. In addition to removing the need to store the system matrix, matrix-free methods are attractive due to their low memory footprint and therefore better match the architecture of modern processors where memory bandwidth is scarce and compute power is abundant. Motivated by this, we consider matrix-free implementations of high-order finite-element methods for execution on graphics processors, which have seen a revolutionary increase in usage for numerical computations during recent years due to their more efficient architecture. In the implementation, we exploit sum-factorization techniques for efficient evaluation of matrix-vector products, mesh coloring and atomic updates for concurrent updates, and a geometric multigrid algorithm for efficient preconditioning of iterative solvers. Our performance studies show that on the GPU, a matrix-free approach is the method of choice for elements of order two and higher, yielding both a significantly faster execution, and allowing for solution of considerably larger problems. Compared to corresponding CPU implementations executed on comparable multicore processors, the GPU implementation is about twice as fast, suggesting that graphics processors are about twice as power efficient as multicores for computations of this kind.

Place, publisher, year, edition, pages
Uppsala: Acta Universitatis Upsaliensis, 2017. 64 p.
Series
Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology, ISSN 1651-6214 ; 1512
Keyword
Finite Element Methods, GPU, Matrix-Free, Multigrid, Transactional Memory
National Category
Computer Science Computational Mathematics
Research subject
Scientific Computing
Identifiers
urn:nbn:se:uu:diva-320147 (URN)978-91-554-9907-5 (ISBN)
Public defence
2017-06-09, ITC 2446, Lägerhyddsvägen 2, Uppsala, 10:15 (English)
Opponent
Supervisors
Available from: 2017-05-16 Created: 2017-04-17 Last updated: 2017-06-28

Open Access in DiVA

No full text

Other links

Publisher's full text

Authority records BETA

Ljungkvist, Karl

Search in DiVA

By author/editor
Ljungkvist, Karl
By organisation
Division of Scientific ComputingComputational Science
Computer ScienceComputational Mathematics

Search outside of DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 611 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf