Logo: to the web site of Uppsala University

uu.sePublikasjoner fra Uppsala universitet
Endre søk
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Performance modelling for parallel PDE solvers on NUMA-systems
Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Matematisk-datavetenskapliga sektionen, Institutionen för informationsteknologi, Avdelningen för teknisk databehandling. Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Matematisk-datavetenskapliga sektionen, Institutionen för informationsteknologi, Numerisk analys. (Software Aspects of High-Performance Computing)
2006 (engelsk)Rapport (Annet vitenskapelig)
Abstract [en]

A detailed model of the memory performance of a PDE solver running on a NUMA-system is set up. Due to the complexity of modern computers, such a detailed model inevitably is very complicated. Therefore, approximations are introduced that simplify the model and allows NUMA-systems and PDE solvers to be described conveniently.

Using the simplified model, it is shown that PDE solvers using ordered local methods can be made very unsensitive to high NUMA-ratios, allowing them to scale well on virtually any NUMA-system.

PDE solvers using unordered local methods, semiglobal methods or global methods are more sensitive to high NUMA-ratios and require special techniques in order to scale well beyond a single locality group.

Nevertheless, the potential performance gain of improving the data distribution on a NUMA-system can be considerable for all kinds of PDE solvers studied.

sted, utgiver, år, opplag, sider
2006.
Serie
Technical report / Department of Information Technology, Uppsala University, ISSN 1404-3203 ; 2006-041
HSV kategori
Identifikatorer
URN: urn:nbn:se:uu:diva-81930OAI: oai:DiVA.org:uu-81930DiVA, id: diva2:109845
Tilgjengelig fra: 2008-02-19 Laget: 2008-02-19 Sist oppdatert: 2024-05-31bibliografisk kontrollert
Inngår i avhandling
1. Multithreaded PDE Solvers on Non-Uniform Memory Architectures
Åpne denne publikasjonen i ny fane eller vindu >>Multithreaded PDE Solvers on Non-Uniform Memory Architectures
2006 (engelsk)Doktoravhandling, med artikler (Annet vitenskapelig)
Abstract [en]

A trend in parallel computer architecture is that systems with a large shared memory are becoming more and more popular. A shared memory system can be either a uniform memory architecture (UMA) or a cache coherent non-uniform memory architecture (cc-NUMA).

In the present thesis, the performance of parallel PDE solvers on cc-NUMA computers is studied. In particular, we consider the shared namespace programming model, represented by OpenMP. Since the main memory is physically, or geographically distributed over several multi-processor nodes, the latency for local memory accesses is smaller than for remote accesses. Therefore, the geographical locality of the data becomes important.

The focus of the present thesis is to study multithreaded PDE solvers on cc-NUMA systems, in particular their memory access pattern with respect to geographical locality. The questions posed are: (1) How large is the influence on performance of the non-uniformity of the memory system? (2) How should a program be written in order to reduce this influence? (3) Is it possible to introduce optimizations in the computer system for this purpose?

The main conclusion is that geographical locality is important for performance on cc-NUMA systems. This is shown experimentally for a broad range of PDE solvers as well as theoretically using a model involving characteristics of computer systems and applications.

Geographical locality can be achieved through migration directives that are inserted by the programmer or — possibly in the future — automatically by the compiler. On some systems, it can also be accomplished by means of transparent, hardware initiated migration and replication. However, a necessary condition that must be fulfilled if migration is to be effective is that the memory access pattern must not be "speckled", i.e. as few threads as possible shall make accesses to each memory page.

We also conclude that OpenMP is competitive with MPI on cc-NUMA systems if care is taken to get a favourable data distribution.

sted, utgiver, år, opplag, sider
Uppsala: Acta Universitatis Upsaliensis, 2006. s. 33
Serie
Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology, ISSN 1651-6214 ; 224
Emneord
PDE solver, high-performance, NUMA, UMA, OpenMP, MPI, data migration, data replication, thread scheduling, data affinity
HSV kategori
Forskningsprogram
Beräkningsvetenskap
Identifikatorer
urn:nbn:se:uu:diva-7149 (URN)91-554-6656-7 (ISBN)
Disputas
2006-10-20, Room 2446, Polacksbacken, Lägerhyddsvägen 2D, Uppsala, 10:15 (engelsk)
Opponent
Veileder
Tilgjengelig fra: 2006-09-28 Laget: 2006-09-28 Sist oppdatert: 2022-03-11bibliografisk kontrollert

Open Access i DiVA

fulltext(325 kB)3 nedlastinger
Filinformasjon
Fil FULLTEXT01.pdfFilstørrelse 325 kBChecksum SHA-512
3197ca76f5f1bd399799b513061fe9ce39a9c53ad5457b5eac6af60d1c0bc0f7a03c695852c3b632f0f13281fe1faa1bac7598ce140e80be3175b84c27a57b44
Type fulltextMimetype application/pdf

Person

Nordén, Markus

Søk i DiVA

Av forfatter/redaktør
Nordén, Markus
Av organisasjonen

Søk utenfor DiVA

GoogleGoogle Scholar
Totalt: 3 nedlastinger
Antall nedlastinger er summen av alle nedlastinger av alle fulltekster. Det kan for eksempel være tidligere versjoner som er ikke lenger tilgjengelige

urn-nbn

Altmetric

urn-nbn
Totalt: 579 treff
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf