Logo: to the web site of Uppsala University

uu.sePublikasjoner fra Uppsala universitet
Endre søk
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Performance of PDE solvers on a self-optimizing NUMA architecture
Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Matematisk-datavetenskapliga sektionen, Institutionen för informationsteknologi, Avdelningen för teknisk databehandling. Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Matematisk-datavetenskapliga sektionen, Institutionen för informationsteknologi, Numerisk analys.
Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Matematisk-datavetenskapliga sektionen, Institutionen för informationsteknologi, Avdelningen för teknisk databehandling. Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Matematisk-datavetenskapliga sektionen, Institutionen för informationsteknologi, Numerisk analys. (Software Aspects of High-Performance Computing)
Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Matematisk-datavetenskapliga sektionen, Institutionen för informationsteknologi, Avdelningen för teknisk databehandling. Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Matematisk-datavetenskapliga sektionen, Institutionen för informationsteknologi, Numerisk analys. (Software Aspects of High-Performance Computing)
Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Matematisk-datavetenskapliga sektionen, Institutionen för informationsteknologi, Datorteknik. (Uppsala Architecture Research Team)
2002 (engelsk)Inngår i: Parallel Algorithms and Applications, ISSN 1063-7192, E-ISSN 1029-032X, Vol. 17, s. 285-299Artikkel i tidsskrift (Fagfellevurdert) Published
sted, utgiver, år, opplag, sider
2002. Vol. 17, s. 285-299
HSV kategori
Identifikatorer
URN: urn:nbn:se:uu:diva-66909DOI: 10.1080/01495730208941445OAI: oai:DiVA.org:uu-66909DiVA, id: diva2:94820
Tilgjengelig fra: 2006-05-22 Laget: 2006-05-22 Sist oppdatert: 2018-01-10bibliografisk kontrollert
Inngår i avhandling
1. Methods for Creating and Exploiting Data Locality
Åpne denne publikasjonen i ny fane eller vindu >>Methods for Creating and Exploiting Data Locality
2006 (engelsk)Doktoravhandling, med artikler (Annet vitenskapelig)
Abstract [en]

The gap between processor speed and memory latency has led to the use of caches in the memory systems of modern computers. Programs must use the caches efficiently and exploit data locality for maximum performance. Multiprocessors, built from many processing units, are becoming commonplace not only in large servers but also in smaller systems such as personal computers. Multiprocessors require careful data locality optimizations since accesses from other processors can lead to invalidations and false sharing cache misses. This thesis explores hardware and software approaches for creating and exploiting temporal and spatial locality in multiprocessors.

We propose the capacity prefetching technique, which efficiently reduces the number of cache misses but avoids false sharing by distinguishing between cache lines involved in communication from non-communicating cache lines at run-time. Prefetching techniques often lead to increased coherence and data traffic. The new bundling technique avoids one of these drawbacks and reduces the coherence traffic in multiprocessor prefetchers. This is especially important in snoop-based systems where the coherence bandwidth is a scarce resource.

Most of the studies have been performed on advanced scientific algorithms. This thesis demonstrates that a cc-NUMA multiprocessor, with hardware data migration and replication optimizations, efficiently exploits the temporal locality in such codes. We further present a method of parallelizing a multigrid Gauss-Seidel partial differential equation solver, which creates temporal locality at the expense of increased communication. Our conclusion is that on modern chip multiprocessors, it is more important to optimize algorithms for data locality than to avoid communication, since communication can take place using a shared cache.

sted, utgiver, år, opplag, sider
Uppsala: Acta Universitatis Upsaliensis, 2006. s. 37
Serie
Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology, ISSN 1651-6214 ; 176
Emneord
data locality, temporal locality, spatial locality, prefetching, cache, cache behavior, cache coherence, snooping protocols, partial differential equation, shared-memory multiprocessor, chip multiprocessor, simulation
HSV kategori
Identifikatorer
urn:nbn:se:uu:diva-6837 (URN)91-554-6555-2 (ISBN)
Disputas
2006-05-24, Room 2446, Polacksbacken, Lägerhyddsvägen 2D, Uppsala, 13:15 (engelsk)
Opponent
Veileder
Tilgjengelig fra: 2006-04-28 Laget: 2006-04-28 Sist oppdatert: 2022-03-11bibliografisk kontrollert
2. Multithreaded PDE Solvers on Non-Uniform Memory Architectures
Åpne denne publikasjonen i ny fane eller vindu >>Multithreaded PDE Solvers on Non-Uniform Memory Architectures
2006 (engelsk)Doktoravhandling, med artikler (Annet vitenskapelig)
Abstract [en]

A trend in parallel computer architecture is that systems with a large shared memory are becoming more and more popular. A shared memory system can be either a uniform memory architecture (UMA) or a cache coherent non-uniform memory architecture (cc-NUMA).

In the present thesis, the performance of parallel PDE solvers on cc-NUMA computers is studied. In particular, we consider the shared namespace programming model, represented by OpenMP. Since the main memory is physically, or geographically distributed over several multi-processor nodes, the latency for local memory accesses is smaller than for remote accesses. Therefore, the geographical locality of the data becomes important.

The focus of the present thesis is to study multithreaded PDE solvers on cc-NUMA systems, in particular their memory access pattern with respect to geographical locality. The questions posed are: (1) How large is the influence on performance of the non-uniformity of the memory system? (2) How should a program be written in order to reduce this influence? (3) Is it possible to introduce optimizations in the computer system for this purpose?

The main conclusion is that geographical locality is important for performance on cc-NUMA systems. This is shown experimentally for a broad range of PDE solvers as well as theoretically using a model involving characteristics of computer systems and applications.

Geographical locality can be achieved through migration directives that are inserted by the programmer or — possibly in the future — automatically by the compiler. On some systems, it can also be accomplished by means of transparent, hardware initiated migration and replication. However, a necessary condition that must be fulfilled if migration is to be effective is that the memory access pattern must not be "speckled", i.e. as few threads as possible shall make accesses to each memory page.

We also conclude that OpenMP is competitive with MPI on cc-NUMA systems if care is taken to get a favourable data distribution.

sted, utgiver, år, opplag, sider
Uppsala: Acta Universitatis Upsaliensis, 2006. s. 33
Serie
Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology, ISSN 1651-6214 ; 224
Emneord
PDE solver, high-performance, NUMA, UMA, OpenMP, MPI, data migration, data replication, thread scheduling, data affinity
HSV kategori
Forskningsprogram
Beräkningsvetenskap
Identifikatorer
urn:nbn:se:uu:diva-7149 (URN)91-554-6656-7 (ISBN)
Disputas
2006-10-20, Room 2446, Polacksbacken, Lägerhyddsvägen 2D, Uppsala, 10:15 (engelsk)
Opponent
Veileder
Tilgjengelig fra: 2006-09-28 Laget: 2006-09-28 Sist oppdatert: 2022-03-11bibliografisk kontrollert

Open Access i DiVA

Fulltekst mangler i DiVA

Andre lenker

Forlagets fulltekst

Person

Holmgren, SverkerNordén, MarkusRantakokko, JarmoWallin, Dan

Søk i DiVA

Av forfatter/redaktør
Holmgren, SverkerNordén, MarkusRantakokko, JarmoWallin, Dan
Av organisasjonen
I samme tidsskrift
Parallel Algorithms and Applications

Søk utenfor DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric

doi
urn-nbn
Totalt: 519 treff
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf