uu.seUppsala University Publications

CiteExport$(function(){PrimeFaces.cw("TieredMenu","widget_formSmash_upper_j_idt182",{id:"formSmash:upper:j_idt182",widgetVar:"widget_formSmash_upper_j_idt182",autoDisplay:true,overlay:true,my:"left top",at:"left bottom",trigger:"formSmash:upper:exportLink",triggerEvent:"click"});}); $(function(){PrimeFaces.cw("OverlayPanel","widget_formSmash_upper_j_idt191_j_idt195",{id:"formSmash:upper:j_idt191:j_idt195",widgetVar:"widget_formSmash_upper_j_idt191_j_idt195",target:"formSmash:upper:j_idt191:permLink",showEffect:"blind",hideEffect:"fade",my:"right top",at:"right bottom",showCloseIcon:true});});

Finite Element Computations on Multicore and Graphics ProcessorsPrimeFaces.cw("AccordionPanel","widget_formSmash_some",{id:"formSmash:some",widgetVar:"widget_formSmash_some",multiple:true}); PrimeFaces.cw("AccordionPanel","widget_formSmash_all",{id:"formSmash:all",widgetVar:"widget_formSmash_all",multiple:true});
function selectAll()
{
var panelSome = $(PrimeFaces.escapeClientId("formSmash:some"));
var panelAll = $(PrimeFaces.escapeClientId("formSmash:all"));
panelAll.toggle();
toggleList(panelSome.get(0).childNodes, panelAll);
toggleList(panelAll.get(0).childNodes, panelAll);
}
/*Toggling the list of authorPanel nodes according to the toggling of the closeable second panel */
function toggleList(childList, panel)
{
var panelWasOpen = (panel.get(0).style.display == 'none');
// console.log('panel was open ' + panelWasOpen);
for (var c = 0; c < childList.length; c++) {
if (childList[c].classList.contains('authorPanel')) {
clickNode(panelWasOpen, childList[c]);
}
}
}
/*nodes have styleClass ui-corner-top if they are expanded and ui-corner-all if they are collapsed */
function clickNode(collapse, child)
{
if (collapse && child.classList.contains('ui-corner-top')) {
// console.log('collapse');
child.click();
}
if (!collapse && child.classList.contains('ui-corner-all')) {
// console.log('expand');
child.click();
}
}
2017 (English)Doctoral thesis, comprehensive summary (Other academic)
##### Abstract [en]

##### Place, publisher, year, edition, pages

Uppsala: Acta Universitatis Upsaliensis, 2017. , p. 64
##### Series

Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology, ISSN 1651-6214 ; 1512
##### Keywords [en]

Finite Element Methods, GPU, Matrix-Free, Multigrid, Transactional Memory
##### National Category

Computer Sciences Computational Mathematics
##### Research subject

Scientific Computing
##### Identifiers

URN: urn:nbn:se:uu:diva-320147ISBN: 978-91-554-9907-5 (print)OAI: oai:DiVA.org:uu-320147DiVA, id: diva2:1088894
##### Public defence

2017-06-09, ITC 2446, Lägerhyddsvägen 2, Uppsala, 10:15 (English)
##### Opponent

PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt803",{id:"formSmash:j_idt803",widgetVar:"widget_formSmash_j_idt803",multiple:true});
##### Supervisors

PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt815",{id:"formSmash:j_idt815",widgetVar:"widget_formSmash_j_idt815",multiple:true});
#####

PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt831",{id:"formSmash:j_idt831",widgetVar:"widget_formSmash_j_idt831",multiple:true});
##### Projects

UPMARCAvailable from: 2017-05-16 Created: 2017-04-17 Last updated: 2019-02-25
##### List of papers

In this thesis, techniques for efficient utilization of modern computer hardwarefor numerical simulation are considered. In particular, we study techniques for improving the performance of computations using the finite element method.

One of the main difficulties in finite-element computations is how to perform the assembly of the system matrix efficiently in parallel, due to its complicated memory access pattern. The challenge lies in the fact that many entries of the matrix are being updated concurrently by several parallel threads. We consider transactional memory, an exotic hardware feature for concurrent update of shared variables, and conduct benchmarks on a prototype multicore processor supporting it. Our experiments show that transactions can both simplify programming and provide good performance for concurrent updates of floating point data.

Secondly, we study a matrix-free approach to finite-element computation which avoids the matrix assembly. In addition to removing the need to store the system matrix, matrix-free methods are attractive due to their low memory footprint and therefore better match the architecture of modern processors where memory bandwidth is scarce and compute power is abundant. Motivated by this, we consider matrix-free implementations of high-order finite-element methods for execution on graphics processors, which have seen a revolutionary increase in usage for numerical computations during recent years due to their more efficient architecture. In the implementation, we exploit sum-factorization techniques for efficient evaluation of matrix-vector products, mesh coloring and atomic updates for concurrent updates, and a geometric multigrid algorithm for efficient preconditioning of iterative solvers. Our performance studies show that on the GPU, a matrix-free approach is the method of choice for elements of order two and higher, yielding both a significantly faster execution, and allowing for solution of considerably larger problems. Compared to corresponding CPU implementations executed on comparable multicore processors, the GPU implementation is about twice as fast, suggesting that graphics processors are about twice as power efficient as multicores for computations of this kind.

1. Using hardware transactional memory for high-performance computing$(function(){PrimeFaces.cw("OverlayPanel","overlay440014",{id:"formSmash:j_idt925:0:j_idt935",widgetVar:"overlay440014",target:"formSmash:j_idt925:0:partsLink",showEvent:"mousedown",hideEvent:"mousedown",showEffect:"blind",hideEffect:"fade",appendToBody:true});});

2. Matrix-free finite-element operator application on graphics processing units$(function(){PrimeFaces.cw("OverlayPanel","overlay770982",{id:"formSmash:j_idt925:1:j_idt935",widgetVar:"overlay770982",target:"formSmash:j_idt925:1:partsLink",showEvent:"mousedown",hideEvent:"mousedown",showEffect:"blind",hideEffect:"fade",appendToBody:true});});

3. Matrix-free finite-element computations on graphics processors with adaptively refined unstructured meshes$(function(){PrimeFaces.cw("OverlayPanel","overlay1088816",{id:"formSmash:j_idt925:2:j_idt935",widgetVar:"overlay1088816",target:"formSmash:j_idt925:2:partsLink",showEvent:"mousedown",hideEvent:"mousedown",showEffect:"blind",hideEffect:"fade",appendToBody:true});});

4. Multigrid for matrix-free finite element computations on graphics processors$(function(){PrimeFaces.cw("OverlayPanel","overlay1088660",{id:"formSmash:j_idt925:3:j_idt935",widgetVar:"overlay1088660",target:"formSmash:j_idt925:3:partsLink",showEvent:"mousedown",hideEvent:"mousedown",showEffect:"blind",hideEffect:"fade",appendToBody:true});});

isbn
urn-nbn$(function(){PrimeFaces.cw("Tooltip","widget_formSmash_j_idt1962",{id:"formSmash:j_idt1962",widgetVar:"widget_formSmash_j_idt1962",showEffect:"fade",hideEffect:"fade",showDelay:500,hideDelay:300,target:"formSmash:altmetricDiv"});});

CiteExport$(function(){PrimeFaces.cw("TieredMenu","widget_formSmash_lower_j_idt2029",{id:"formSmash:lower:j_idt2029",widgetVar:"widget_formSmash_lower_j_idt2029",autoDisplay:true,overlay:true,my:"left top",at:"left bottom",trigger:"formSmash:lower:exportLink",triggerEvent:"click"});}); $(function(){PrimeFaces.cw("OverlayPanel","widget_formSmash_lower_j_idt2030_j_idt2032",{id:"formSmash:lower:j_idt2030:j_idt2032",widgetVar:"widget_formSmash_lower_j_idt2030_j_idt2032",target:"formSmash:lower:j_idt2030:permLink",showEffect:"blind",hideEffect:"fade",my:"right top",at:"right bottom",showCloseIcon:true});});