uu.seUppsala University Publications
Change search
Link to record
Permanent link

Direct link
BETA
Ljungkvist, Karl
Publications (7 of 7) Show all publications
Kronbichler, M. & Ljungkvist, K. (2019). Multigrid for Matrix-Free High-Order Finite Element Computations on Graphics Processors. ACM TRANSACTIONS ON PARALLEL COMPUTING, 6(1), Article ID 2.
Open this publication in new window or tab >>Multigrid for Matrix-Free High-Order Finite Element Computations on Graphics Processors
2019 (English)In: ACM TRANSACTIONS ON PARALLEL COMPUTING, ISSN 2329-4949, Vol. 6, no 1, article id 2Article in journal (Refereed) Published
Abstract [en]

This article presentsmatrix-free finite-element techniques for efficiently solving partial differential equations on modern many-core processors, such as graphics cards. We develop a GPU parallelization of a matrix-free geometric multigrid iterative solver targeting moderate and high polynomial degrees, with support for general curved and adaptively refined hexahedral meshes with hanging nodes. The central algorithmic component is the matrix-free operator evaluation with sum factorization. We compare the node-level performance of our implementation running on an Nvidia Pascal P100 GPU to a highly optimized multicore implementation running on comparable Intel Broadwell CPUs and an Intel Xeon Phi. Our experiments show that the GPU implementation is approximately 1.5 to 2 times faster across four different scenarios of the Poisson equation and a variety of element degrees in 2D and 3D. The lowest time to solution per degree of freedom is recorded for moderate polynomial degrees between 3 and 5. A detailed performance analysis highlights the capabilities of the GPU architecture and the chosen execution model with threading within the element, particularly with respect to the evaluation of the matrix-vector product. Atomic intrinsics are shown to provide a fast way for avoiding the possible race conditions in summing the elemental residuals into the global vector associated to shared vertices, edges, and surfaces. In addition, the solver infrastructure allows for using mixed-precision arithmetic that performs the multigrid V-cycle in single precision with an outer correction in double precision, increasing throughput by up to 83%.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2019
Keywords
Finite element method, sum factorization, matrix-free method, geometric multigrid, CUDA
National Category
Computational Mathematics Computer Engineering
Identifiers
urn:nbn:se:uu:diva-390587 (URN)10.1145/3322813 (DOI)000472838200002 ()
Funder
German Research Foundation (DFG), KR4661/2-1
Available from: 2019-08-13 Created: 2019-08-13 Last updated: 2019-08-13Bibliographically approved
Ljungkvist, K. (2017). Matrix-free finite-element computations on graphics processors with adaptively refined unstructured meshes. In: Proc. 25th High Performance Computing Symposium: . Paper presented at HPC 2017, April 23–26, Virginia Beach, VA (pp. 1-12). San Diego, CA: The Society for Modeling and Simulation International
Open this publication in new window or tab >>Matrix-free finite-element computations on graphics processors with adaptively refined unstructured meshes
2017 (English)In: Proc. 25th High Performance Computing Symposium, San Diego, CA: The Society for Modeling and Simulation International, 2017, p. 1-12Conference paper, Published paper (Refereed)
Place, publisher, year, edition, pages
San Diego, CA: The Society for Modeling and Simulation International, 2017
National Category
Computer Sciences Computational Mathematics
Identifiers
urn:nbn:se:uu:diva-320146 (URN)978-1-5108-3822-2 (ISBN)
Conference
HPC 2017, April 23–26, Virginia Beach, VA
Projects
UPMARC
Available from: 2017-04-23 Created: 2017-04-16 Last updated: 2018-01-13Bibliographically approved
Ljungkvist, K. & Kronbichler, M. (2017). Multigrid for matrix-free finite element computations on graphics processors.
Open this publication in new window or tab >>Multigrid for matrix-free finite element computations on graphics processors
2017 (English)Report (Other academic)
Series
Technical report / Department of Information Technology, Uppsala University, ISSN 1404-3203 ; 2017-006
National Category
Computer Sciences Computational Mathematics
Identifiers
urn:nbn:se:uu:diva-320073 (URN)
Projects
UPMARCeSSENCE
Available from: 2017-04-20 Created: 2017-04-13 Last updated: 2018-01-13Bibliographically approved
Ljungkvist, K. (2015). Techniques for finite element methods on modern processors. (Licentiate dissertation). Uppsala University
Open this publication in new window or tab >>Techniques for finite element methods on modern processors
2015 (English)Licentiate thesis, comprehensive summary (Other academic)
Abstract [en]

In this thesis, methods for efficient utilization of modern computer hardware for numerical simulation are considered. In particular, we study techniques for speeding up the execution of finite-element methods.

One of the greatest challenges in finite-element computation is how to efficiently perform the the system matrix assembly efficiently in parallel, due to its complicated memory access pattern. The main difficulty lies in the fact that many entries of the matrix are being updated concurrently by several parallel threads. We consider transactional memory, an exotic hardware feature for concurrent update of shared variables, and conduct benchmarks on a prototype processor supporting it. Our experiments show that transactions can both simplify programming and provide good performance for concurrent updates of floating point data.

Furthermore, we study a matrix-free approach to finite-element computation which avoids the matrix assembly. Motivated by its computational properties, we implement the matrix-free method for execution on graphics processors, using either atomic updates or a mesh coloring approach to handle the concurrent updates. A performance study shows that on the GPU, the matrix-free method is faster than a matrix-based implementation for many element types, and allows for solution of considerably larger problems. This suggests that the matrix-free method can speed up execution of large realistic simulations.

Place, publisher, year, edition, pages
Uppsala University, 2015
Series
Information technology licentiate theses: Licentiate theses from the Department of Information Technology, ISSN 1404-5117 ; 2015-001
National Category
Computer Sciences Computational Mathematics
Research subject
Scientific Computing
Identifiers
urn:nbn:se:uu:diva-242186 (URN)
Supervisors
Projects
UPMARCeSSENCE
Available from: 2015-01-18 Created: 2015-01-22 Last updated: 2018-01-11Bibliographically approved
Ljungkvist, K. (2014). Matrix-free finite-element operator application on graphics processing units. In: Euro-Par 2014: Parallel Processing Workshops, Part II. Paper presented at 7th Workshop on Unconventional High-Performance Computing (pp. 450-461). Springer
Open this publication in new window or tab >>Matrix-free finite-element operator application on graphics processing units
2014 (English)In: Euro-Par 2014: Parallel Processing Workshops, Part II, Springer, 2014, p. 450-461Conference paper, Published paper (Refereed)
Place, publisher, year, edition, pages
Springer, 2014
Series
Lecture Notes in Computer Science ; 8806
National Category
Computer Sciences Computational Mathematics
Identifiers
urn:nbn:se:uu:diva-238380 (URN)10.1007/978-3-319-14313-2_38 (DOI)000354785000038 ()978-3-319-14312-5 (ISBN)
Conference
7th Workshop on Unconventional High-Performance Computing
Projects
UPMARCeSSENCE
Available from: 2014-12-11 Created: 2014-12-11 Last updated: 2018-01-11Bibliographically approved
Ljungkvist, K., Tillenius, M., Black-Schaffer, D., Holmgren, S., Karlsson, M. & Larsson, E. (2011). Using hardware transactional memory for high-performance computing. In: Proc. 25th International Symposium on Parallel and Distributed Processing Workshops and PhD Forum. Paper presented at IPDPS Workshop on Multi-Threaded Architectures and Applications (pp. 1660-1667). Piscataway, NJ: IEEE
Open this publication in new window or tab >>Using hardware transactional memory for high-performance computing
Show others...
2011 (English)In: Proc. 25th International Symposium on Parallel and Distributed Processing Workshops and PhD Forum, Piscataway, NJ: IEEE , 2011, p. 1660-1667Conference paper, Published paper (Refereed)
Place, publisher, year, edition, pages
Piscataway, NJ: IEEE, 2011
National Category
Computer Sciences
Identifiers
urn:nbn:se:uu:diva-158551 (URN)10.1109/IPDPS.2011.322 (DOI)978-1-61284-425-1 (ISBN)
Conference
IPDPS Workshop on Multi-Threaded Architectures and Applications
Projects
eSSENCEUPMARC
Available from: 2011-09-01 Created: 2011-09-10 Last updated: 2018-01-12Bibliographically approved
Ljungkvist, K., Tillenius, M., Holmgren, S., Karlsson, M. & Larsson, E. (2010). Early results using hardware transactional memory for high-performance computing applications. In: Proc. 3rd Swedish Workshop on Multi-Core Computing (pp. 93-97). Göteborg, Sweden: Chalmers University of Technology
Open this publication in new window or tab >>Early results using hardware transactional memory for high-performance computing applications
Show others...
2010 (English)In: Proc. 3rd Swedish Workshop on Multi-Core Computing, Göteborg, Sweden: Chalmers University of Technology , 2010, p. 93-97Conference paper, Published paper (Other academic)
Place, publisher, year, edition, pages
Göteborg, Sweden: Chalmers University of Technology, 2010
National Category
Computer Sciences
Identifiers
urn:nbn:se:uu:diva-134615 (URN)
Projects
eSSENCEUPMARC
Available from: 2010-11-18 Created: 2010-11-29 Last updated: 2018-01-12Bibliographically approved
Organisations

Search in DiVA

Show all publications