uu.seUppsala University Publications

Please wait ... |

Link to record
http://uu.diva-portal.org/smash/person.jsf?pid=authority-person:15608 $(function(){PrimeFaces.cw("InputTextarea","widget_formSmash_upper_j_idt122_recordDirectLink",{id:"formSmash:upper:j_idt122:recordDirectLink",widgetVar:"widget_formSmash_upper_j_idt122_recordDirectLink",autoResize:true});}); $(function(){PrimeFaces.cw("OverlayPanel","widget_formSmash_upper_j_idt122_j_idt124",{id:"formSmash:upper:j_idt122:j_idt124",widgetVar:"widget_formSmash_upper_j_idt122_j_idt124",target:"formSmash:upper:j_idt122:permLink",showEffect:"blind",hideEffect:"fade",my:"right top",at:"right bottom",showCloseIcon:true});});

Permanent link

Direct link

Ljungkvist, Karl

Open this publication in new window or tab >>Multigrid for Matrix-Free High-Order Finite Element Computations on Graphics Processors### Kronbichler, Martin

### Ljungkvist, Karl

PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_0_j_idt188_some",{id:"formSmash:j_idt184:0:j_idt188:some",widgetVar:"widget_formSmash_j_idt184_0_j_idt188_some",multiple:true}); PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_0_j_idt188_otherAuthors",{id:"formSmash:j_idt184:0:j_idt188:otherAuthors",widgetVar:"widget_formSmash_j_idt184_0_j_idt188_otherAuthors",multiple:true}); 2019 (English)In: ACM TRANSACTIONS ON PARALLEL COMPUTING, ISSN 2329-4949, Vol. 6, no 1, article id 2Article in journal (Refereed) Published
##### Abstract [en]

##### Place, publisher, year, edition, pages

Association for Computing Machinery (ACM), 2019
##### Keywords

Finite element method, sum factorization, matrix-free method, geometric multigrid, CUDA
##### National Category

Computational Mathematics Computer Engineering
##### Identifiers

urn:nbn:se:uu:diva-390587 (URN)10.1145/3322813 (DOI)000472838200002 ()
#####

PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_0_j_idt188_j_idt359",{id:"formSmash:j_idt184:0:j_idt188:j_idt359",widgetVar:"widget_formSmash_j_idt184_0_j_idt188_j_idt359",multiple:true});
#####

PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_0_j_idt188_j_idt365",{id:"formSmash:j_idt184:0:j_idt188:j_idt365",widgetVar:"widget_formSmash_j_idt184_0_j_idt188_j_idt365",multiple:true});
#####

PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_0_j_idt188_j_idt371",{id:"formSmash:j_idt184:0:j_idt188:j_idt371",widgetVar:"widget_formSmash_j_idt184_0_j_idt188_j_idt371",multiple:true});
#####

##### Funder

German Research Foundation (DFG), KR4661/2-1
Available from: 2019-08-13 Created: 2019-08-13 Last updated: 2019-08-13Bibliographically approved

Tech Univ Munich, Inst Computat Mech, Boltzmannstr 15, D-85748 Garching, Germany.

Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computational Science. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Scientific Computing.

This article presentsmatrix-free finite-element techniques for efficiently solving partial differential equations on modern many-core processors, such as graphics cards. We develop a GPU parallelization of a matrix-free geometric multigrid iterative solver targeting moderate and high polynomial degrees, with support for general curved and adaptively refined hexahedral meshes with hanging nodes. The central algorithmic component is the matrix-free operator evaluation with sum factorization. We compare the node-level performance of our implementation running on an Nvidia Pascal P100 GPU to a highly optimized multicore implementation running on comparable Intel Broadwell CPUs and an Intel Xeon Phi. Our experiments show that the GPU implementation is approximately 1.5 to 2 times faster across four different scenarios of the Poisson equation and a variety of element degrees in 2D and 3D. The lowest time to solution per degree of freedom is recorded for moderate polynomial degrees between 3 and 5. A detailed performance analysis highlights the capabilities of the GPU architecture and the chosen execution model with threading within the element, particularly with respect to the evaluation of the matrix-vector product. Atomic intrinsics are shown to provide a fast way for avoiding the possible race conditions in summing the elemental residuals into the global vector associated to shared vertices, edges, and surfaces. In addition, the solver infrastructure allows for using mixed-precision arithmetic that performs the multigrid V-cycle in single precision with an outer correction in double precision, increasing throughput by up to 83%.

Open this publication in new window or tab >>Matrix-free finite-element computations on graphics processors with adaptively refined unstructured meshes### Ljungkvist, Karl

PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_1_j_idt188_some",{id:"formSmash:j_idt184:1:j_idt188:some",widgetVar:"widget_formSmash_j_idt184_1_j_idt188_some",multiple:true}); PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_1_j_idt188_otherAuthors",{id:"formSmash:j_idt184:1:j_idt188:otherAuthors",widgetVar:"widget_formSmash_j_idt184_1_j_idt188_otherAuthors",multiple:true}); 2017 (English)In: Proc. 25th High Performance Computing Symposium, San Diego, CA: The Society for Modeling and Simulation International, 2017, p. 1-12Conference paper, Published paper (Refereed)
##### Place, publisher, year, edition, pages

San Diego, CA: The Society for Modeling and Simulation International, 2017
##### National Category

Computer Sciences Computational Mathematics
##### Identifiers

urn:nbn:se:uu:diva-320146 (URN)978-1-5108-3822-2 (ISBN)
##### Conference

HPC 2017, April 23–26, Virginia Beach, VA
#####

PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_1_j_idt188_j_idt359",{id:"formSmash:j_idt184:1:j_idt188:j_idt359",widgetVar:"widget_formSmash_j_idt184_1_j_idt188_j_idt359",multiple:true});
#####

PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_1_j_idt188_j_idt365",{id:"formSmash:j_idt184:1:j_idt188:j_idt365",widgetVar:"widget_formSmash_j_idt184_1_j_idt188_j_idt365",multiple:true});
#####

PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_1_j_idt188_j_idt371",{id:"formSmash:j_idt184:1:j_idt188:j_idt371",widgetVar:"widget_formSmash_j_idt184_1_j_idt188_j_idt371",multiple:true});
#####

##### Projects

UPMARC
Available from: 2017-04-23 Created: 2017-04-16 Last updated: 2018-01-13Bibliographically approved

Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Scientific Computing. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computational Science.

Open this publication in new window or tab >>Multigrid for matrix-free finite element computations on graphics processors### Ljungkvist, Karl

### Kronbichler, Martin

PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_2_j_idt188_some",{id:"formSmash:j_idt184:2:j_idt188:some",widgetVar:"widget_formSmash_j_idt184_2_j_idt188_some",multiple:true}); PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_2_j_idt188_otherAuthors",{id:"formSmash:j_idt184:2:j_idt188:otherAuthors",widgetVar:"widget_formSmash_j_idt184_2_j_idt188_otherAuthors",multiple:true}); 2017 (English)Report (Other academic)
##### Series

Technical report / Department of Information Technology, Uppsala University, ISSN 1404-3203 ; 2017-006
##### National Category

Computer Sciences Computational Mathematics
##### Identifiers

urn:nbn:se:uu:diva-320073 (URN)
#####

PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_2_j_idt188_j_idt359",{id:"formSmash:j_idt184:2:j_idt188:j_idt359",widgetVar:"widget_formSmash_j_idt184_2_j_idt188_j_idt359",multiple:true});
#####

PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_2_j_idt188_j_idt365",{id:"formSmash:j_idt184:2:j_idt188:j_idt365",widgetVar:"widget_formSmash_j_idt184_2_j_idt188_j_idt365",multiple:true});
#####

PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_2_j_idt188_j_idt371",{id:"formSmash:j_idt184:2:j_idt188:j_idt371",widgetVar:"widget_formSmash_j_idt184_2_j_idt188_j_idt371",multiple:true});
#####

##### Projects

UPMARCeSSENCE
Available from: 2017-04-20 Created: 2017-04-13 Last updated: 2018-01-13Bibliographically approved

Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Scientific Computing. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computational Science.

Open this publication in new window or tab >>Techniques for finite element methods on modern processors### Ljungkvist, Karl

PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_3_j_idt188_some",{id:"formSmash:j_idt184:3:j_idt188:some",widgetVar:"widget_formSmash_j_idt184_3_j_idt188_some",multiple:true}); PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_3_j_idt188_otherAuthors",{id:"formSmash:j_idt184:3:j_idt188:otherAuthors",widgetVar:"widget_formSmash_j_idt184_3_j_idt188_otherAuthors",multiple:true}); 2015 (English)Licentiate thesis, comprehensive summary (Other academic)
##### Abstract [en]

##### Place, publisher, year, edition, pages

Uppsala University, 2015
##### Series

Information technology licentiate theses: Licentiate theses from the Department of Information Technology, ISSN 1404-5117 ; 2015-001
##### National Category

Computer Sciences Computational Mathematics
##### Research subject

Scientific Computing
##### Identifiers

urn:nbn:se:uu:diva-242186 (URN)
#####

PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_3_j_idt188_j_idt359",{id:"formSmash:j_idt184:3:j_idt188:j_idt359",widgetVar:"widget_formSmash_j_idt184_3_j_idt188_j_idt359",multiple:true});
##### Supervisors

### Rantakokko, Jarmo

PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_3_j_idt188_j_idt365",{id:"formSmash:j_idt184:3:j_idt188:j_idt365",widgetVar:"widget_formSmash_j_idt184_3_j_idt188_j_idt365",multiple:true});
#####

PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_3_j_idt188_j_idt371",{id:"formSmash:j_idt184:3:j_idt188:j_idt371",widgetVar:"widget_formSmash_j_idt184_3_j_idt188_j_idt371",multiple:true});
#####

##### Projects

UPMARCeSSENCE
Available from: 2015-01-18 Created: 2015-01-22 Last updated: 2018-01-11Bibliographically approved

Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Scientific Computing. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computational Science.

In this thesis, methods for efficient utilization of modern computer hardware for numerical simulation are considered. In particular, we study techniques for speeding up the execution of finite-element methods.

One of the greatest challenges in finite-element computation is how to efficiently perform the the system matrix assembly efficiently in parallel, due to its complicated memory access pattern. The main difficulty lies in the fact that many entries of the matrix are being updated concurrently by several parallel threads. We consider transactional memory, an exotic hardware feature for concurrent update of shared variables, and conduct benchmarks on a prototype processor supporting it. Our experiments show that transactions can both simplify programming and provide good performance for concurrent updates of floating point data.

Furthermore, we study a matrix-free approach to finite-element computation which avoids the matrix assembly. Motivated by its computational properties, we implement the matrix-free method for execution on graphics processors, using either atomic updates or a mesh coloring approach to handle the concurrent updates. A performance study shows that on the GPU, the matrix-free method is faster than a matrix-based implementation for many element types, and allows for solution of considerably larger problems. This suggests that the matrix-free method can speed up execution of large realistic simulations.

Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Scientific Computing.

Open this publication in new window or tab >>Matrix-free finite-element operator application on graphics processing units### Ljungkvist, Karl

Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Scientific Computing. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computational Science.PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_4_j_idt188_some",{id:"formSmash:j_idt184:4:j_idt188:some",widgetVar:"widget_formSmash_j_idt184_4_j_idt188_some",multiple:true}); PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_4_j_idt188_otherAuthors",{id:"formSmash:j_idt184:4:j_idt188:otherAuthors",widgetVar:"widget_formSmash_j_idt184_4_j_idt188_otherAuthors",multiple:true}); 2014 (English)In: Euro-Par 2014: Parallel Processing Workshops, Part II, Springer, 2014, p. 450-461Conference paper, Published paper (Refereed)
##### Place, publisher, year, edition, pages

Springer, 2014
##### Series

Lecture Notes in Computer Science ; 8806
##### National Category

Computer Sciences Computational Mathematics
##### Identifiers

urn:nbn:se:uu:diva-238380 (URN)10.1007/978-3-319-14313-2_38 (DOI)000354785000038 ()978-3-319-14312-5 (ISBN)
##### Conference

7th Workshop on Unconventional High-Performance Computing
#####

PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_4_j_idt188_j_idt359",{id:"formSmash:j_idt184:4:j_idt188:j_idt359",widgetVar:"widget_formSmash_j_idt184_4_j_idt188_j_idt359",multiple:true});
#####

PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_4_j_idt188_j_idt365",{id:"formSmash:j_idt184:4:j_idt188:j_idt365",widgetVar:"widget_formSmash_j_idt184_4_j_idt188_j_idt365",multiple:true});
#####

PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_4_j_idt188_j_idt371",{id:"formSmash:j_idt184:4:j_idt188:j_idt371",widgetVar:"widget_formSmash_j_idt184_4_j_idt188_j_idt371",multiple:true});
#####

##### Projects

UPMARCeSSENCE
Available from: 2014-12-11 Created: 2014-12-11 Last updated: 2018-01-11Bibliographically approved

Open this publication in new window or tab >>Using hardware transactional memory for high-performance computing### Ljungkvist, Karl

Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Scientific Computing. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computational Science.### Tillenius, Martin

Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Scientific Computing. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computational Science.### Black-Schaffer, David

### Holmgren, Sverker

Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Scientific Computing. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computational Science.PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_5_j_idt188_some",{id:"formSmash:j_idt184:5:j_idt188:some",widgetVar:"widget_formSmash_j_idt184_5_j_idt188_some",multiple:true}); ### Karlsson, Martin

### Larsson, Elisabeth

Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Scientific Computing. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computational Science.PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_5_j_idt188_otherAuthors",{id:"formSmash:j_idt184:5:j_idt188:otherAuthors",widgetVar:"widget_formSmash_j_idt184_5_j_idt188_otherAuthors",multiple:true}); Show others...PrimeFaces.cw("SelectBooleanButton","widget_formSmash_j_idt184_5_j_idt188_j_idt202",{id:"formSmash:j_idt184:5:j_idt188:j_idt202",widgetVar:"widget_formSmash_j_idt184_5_j_idt188_j_idt202",onLabel:"Hide others...",offLabel:"Show others..."}); 2011 (English)In: Proc. 25th International Symposium on Parallel and Distributed Processing Workshops and PhD Forum, Piscataway, NJ: IEEE , 2011, p. 1660-1667Conference paper, Published paper (Refereed)
##### Place, publisher, year, edition, pages

Piscataway, NJ: IEEE, 2011
##### National Category

Computer Sciences
##### Identifiers

urn:nbn:se:uu:diva-158551 (URN)10.1109/IPDPS.2011.322 (DOI)978-1-61284-425-1 (ISBN)
##### Conference

IPDPS Workshop on Multi-Threaded Architectures and Applications
#####

PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_5_j_idt188_j_idt359",{id:"formSmash:j_idt184:5:j_idt188:j_idt359",widgetVar:"widget_formSmash_j_idt184_5_j_idt188_j_idt359",multiple:true});
#####

PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_5_j_idt188_j_idt365",{id:"formSmash:j_idt184:5:j_idt188:j_idt365",widgetVar:"widget_formSmash_j_idt184_5_j_idt188_j_idt365",multiple:true});
#####

PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_5_j_idt188_j_idt371",{id:"formSmash:j_idt184:5:j_idt188:j_idt371",widgetVar:"widget_formSmash_j_idt184_5_j_idt188_j_idt371",multiple:true});
#####

##### Projects

eSSENCEUPMARC
Available from: 2011-09-01 Created: 2011-09-10 Last updated: 2018-01-12Bibliographically approved

Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.

Open this publication in new window or tab >>Early results using hardware transactional memory for high-performance computing applications### Ljungkvist, Karl

### Tillenius, Martin

Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Scientific Computing. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computational Science.### Holmgren, Sverker

Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Scientific Computing. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computational Science.### Karlsson, Martin

PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_6_j_idt188_some",{id:"formSmash:j_idt184:6:j_idt188:some",widgetVar:"widget_formSmash_j_idt184_6_j_idt188_some",multiple:true}); ### Larsson, Elisabeth

Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Scientific Computing. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computational Science.PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_6_j_idt188_otherAuthors",{id:"formSmash:j_idt184:6:j_idt188:otherAuthors",widgetVar:"widget_formSmash_j_idt184_6_j_idt188_otherAuthors",multiple:true}); Show others...PrimeFaces.cw("SelectBooleanButton","widget_formSmash_j_idt184_6_j_idt188_j_idt202",{id:"formSmash:j_idt184:6:j_idt188:j_idt202",widgetVar:"widget_formSmash_j_idt184_6_j_idt188_j_idt202",onLabel:"Hide others...",offLabel:"Show others..."}); 2010 (English)In: Proc. 3rd Swedish Workshop on Multi-Core Computing, Göteborg, Sweden: Chalmers University of Technology , 2010, p. 93-97Conference paper, Published paper (Other academic)
##### Place, publisher, year, edition, pages

Göteborg, Sweden: Chalmers University of Technology, 2010
##### National Category

Computer Sciences
##### Identifiers

urn:nbn:se:uu:diva-134615 (URN)
#####

PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_6_j_idt188_j_idt359",{id:"formSmash:j_idt184:6:j_idt188:j_idt359",widgetVar:"widget_formSmash_j_idt184_6_j_idt188_j_idt359",multiple:true});
#####

PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_6_j_idt188_j_idt365",{id:"formSmash:j_idt184:6:j_idt188:j_idt365",widgetVar:"widget_formSmash_j_idt184_6_j_idt188_j_idt365",multiple:true});
#####

PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_6_j_idt188_j_idt371",{id:"formSmash:j_idt184:6:j_idt188:j_idt371",widgetVar:"widget_formSmash_j_idt184_6_j_idt188_j_idt371",multiple:true});
#####

##### Projects

eSSENCEUPMARC
Available from: 2010-11-18 Created: 2010-11-29 Last updated: 2018-01-12Bibliographically approved