uu.seUppsala University Publications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Multigrid and Gauss-Seidel smoothers revisited: Parallelization on chip multiprocessors
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems. (Uppsala Architecture Research Team)
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Scientific Computing. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Numerical Analysis. (Software Aspects of High-Performance Computing)
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems. (Uppsala Architecture Research Team)
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Scientific Computing. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Numerical Analysis.
2006 (English)In: Proc. 20th ACM International Conference on Supercomputing, New York: ACM Press , 2006, 145-155 p.Conference paper, Published paper (Refereed)
Place, publisher, year, edition, pages
New York: ACM Press , 2006. 145-155 p.
National Category
Computer Science Computational Mathematics
Identifiers
URN: urn:nbn:se:uu:diva-19810DOI: 10.1145/1183401.1183423ISBN: 1-59593-282-8 (print)OAI: oai:DiVA.org:uu-19810DiVA: diva2:47582
Available from: 2008-02-08 Created: 2008-02-08 Last updated: 2011-11-26Bibliographically approved
In thesis
1. Methods for Creating and Exploiting Data Locality
Open this publication in new window or tab >>Methods for Creating and Exploiting Data Locality
2006 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

The gap between processor speed and memory latency has led to the use of caches in the memory systems of modern computers. Programs must use the caches efficiently and exploit data locality for maximum performance. Multiprocessors, built from many processing units, are becoming commonplace not only in large servers but also in smaller systems such as personal computers. Multiprocessors require careful data locality optimizations since accesses from other processors can lead to invalidations and false sharing cache misses. This thesis explores hardware and software approaches for creating and exploiting temporal and spatial locality in multiprocessors.

We propose the capacity prefetching technique, which efficiently reduces the number of cache misses but avoids false sharing by distinguishing between cache lines involved in communication from non-communicating cache lines at run-time. Prefetching techniques often lead to increased coherence and data traffic. The new bundling technique avoids one of these drawbacks and reduces the coherence traffic in multiprocessor prefetchers. This is especially important in snoop-based systems where the coherence bandwidth is a scarce resource.

Most of the studies have been performed on advanced scientific algorithms. This thesis demonstrates that a cc-NUMA multiprocessor, with hardware data migration and replication optimizations, efficiently exploits the temporal locality in such codes. We further present a method of parallelizing a multigrid Gauss-Seidel partial differential equation solver, which creates temporal locality at the expense of increased communication. Our conclusion is that on modern chip multiprocessors, it is more important to optimize algorithms for data locality than to avoid communication, since communication can take place using a shared cache.

Place, publisher, year, edition, pages
Uppsala: Acta Universitatis Upsaliensis, 2006. 37 p.
Series
Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology, ISSN 1651-6214 ; 176
Keyword
data locality, temporal locality, spatial locality, prefetching, cache, cache behavior, cache coherence, snooping protocols, partial differential equation, shared-memory multiprocessor, chip multiprocessor, simulation
National Category
Computer Engineering
Identifiers
urn:nbn:se:uu:diva-6837 (URN)91-554-6555-2 (ISBN)
Public defence
2006-05-24, Room 2446, Polacksbacken, Lägerhyddsvägen 2D, Uppsala, 13:15 (English)
Opponent
Supervisors
Available from: 2006-04-28 Created: 2006-04-28 Last updated: 2011-02-18Bibliographically approved
2. Iterative and Adaptive PDE Solvers for Shared Memory Architectures
Open this publication in new window or tab >>Iterative and Adaptive PDE Solvers for Shared Memory Architectures
2006 (English)Doctoral thesis, comprehensive summary (Other academic)
Alternative title[sv]
Iterativa och adaptiva PDE-lösare för parallelldatorer med gemensam minnesorganisation
Abstract [en]

Scientific computing is used frequently in an increasing number of disciplines to accelerate scientific discovery. Many such computing problems involve the numerical solution of partial differential equations (PDE). In this thesis we explore and develop methodology for high-performance implementations of PDE solvers for shared-memory multiprocessor architectures.

We consider three realistic PDE settings: solution of the Maxwell equations in 3D using an unstructured grid and the method of conjugate gradients, solution of the Poisson equation in 3D using a geometric multigrid method, and solution of an advection equation in 2D using structured adaptive mesh refinement. We apply software optimization techniques to increase both parallel efficiency and the degree of data locality.

In our evaluation we use several different shared-memory architectures ranging from symmetric multiprocessors and distributed shared-memory architectures to chip-multiprocessors. For distributed shared-memory systems we explore methods of data distribution to increase the amount of geographical locality. We evaluate automatic and transparent page migration based on runtime sampling, user-initiated page migration using a directive with an affinity-on-next-touch semantic, and algorithmic optimizations for page-placement policies.

Our results show that page migration increases the amount of geographical locality and that the parallel overhead related to page migration can be amortized over the iterations needed to reach convergence. This is especially true for the affinity-on-next-touch methodology whereby page migration can be initiated at an early stage in the algorithms.

We also develop and explore methodology for other forms of data locality and conclude that the effect on performance is significant and that this effect will increase for future shared-memory architectures. Our overall conclusion is that, if the involved locality issues are addressed, the shared-memory programming model provides an efficient and productive environment for solving many important PDE problems.

Place, publisher, year, edition, pages
Uppsala: Acta Universitatis Upsaliensis, 2006. 49 p.
Series
Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology, ISSN 1651-6214 ; 218
Keyword
partial differential equations, iterative methods, finite elements, conjugate gradients, adaptive mesh refinement, multigrid, cc-NUMA, distributed shared memory, OpenMP, page migration, TLB shoot-down, bandwidth minimization, reverse Cuthill-McKee, migrate-on-next-touch, affinity, temporal locality, chip multiprocessors, CMP
National Category
Software Engineering Computational Mathematics
Research subject
Scientific Computing
Identifiers
urn:nbn:se:uu:diva-7136 (URN)91-554-6648-6 (ISBN)
Public defence
2006-10-07, Auditorium Minus, Museum Gustavianum, Akademigatan 3, Uppsala, 13:15 (English)
Opponent
Supervisors
Available from: 2006-09-15 Created: 2006-09-15 Last updated: 2011-10-26Bibliographically approved

Open Access in DiVA

No full text

Other links

Publisher's full text

Authority records BETA

Wallin, DanLöf, HenrikHagersten, ErikHolmgren, Sverker

Search in DiVA

By author/editor
Wallin, DanLöf, HenrikHagersten, ErikHolmgren, Sverker
By organisation
Computer SystemsDivision of Scientific ComputingNumerical Analysis
Computer ScienceComputational Mathematics

Search outside of DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 539 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf