Logo: to the web site of Uppsala University

uu.sePublications from Uppsala University
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Methods for Creating and Exploiting Data Locality
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Computer Systems. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems. (Uppsala Architecture Research Team)
2006 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

The gap between processor speed and memory latency has led to the use of caches in the memory systems of modern computers. Programs must use the caches efficiently and exploit data locality for maximum performance. Multiprocessors, built from many processing units, are becoming commonplace not only in large servers but also in smaller systems such as personal computers. Multiprocessors require careful data locality optimizations since accesses from other processors can lead to invalidations and false sharing cache misses. This thesis explores hardware and software approaches for creating and exploiting temporal and spatial locality in multiprocessors.

We propose the capacity prefetching technique, which efficiently reduces the number of cache misses but avoids false sharing by distinguishing between cache lines involved in communication from non-communicating cache lines at run-time. Prefetching techniques often lead to increased coherence and data traffic. The new bundling technique avoids one of these drawbacks and reduces the coherence traffic in multiprocessor prefetchers. This is especially important in snoop-based systems where the coherence bandwidth is a scarce resource.

Most of the studies have been performed on advanced scientific algorithms. This thesis demonstrates that a cc-NUMA multiprocessor, with hardware data migration and replication optimizations, efficiently exploits the temporal locality in such codes. We further present a method of parallelizing a multigrid Gauss-Seidel partial differential equation solver, which creates temporal locality at the expense of increased communication. Our conclusion is that on modern chip multiprocessors, it is more important to optimize algorithms for data locality than to avoid communication, since communication can take place using a shared cache.

Place, publisher, year, edition, pages
Uppsala: Acta Universitatis Upsaliensis , 2006. , p. 37
Series
Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology, ISSN 1651-6214 ; 176
Keywords [en]
data locality, temporal locality, spatial locality, prefetching, cache, cache behavior, cache coherence, snooping protocols, partial differential equation, shared-memory multiprocessor, chip multiprocessor, simulation
National Category
Computer Engineering
Identifiers
URN: urn:nbn:se:uu:diva-6837ISBN: 91-554-6555-2 (print)OAI: oai:DiVA.org:uu-6837DiVA, id: diva2:168291
Public defence
2006-05-24, Room 2446, Polacksbacken, Lägerhyddsvägen 2D, Uppsala, 13:15 (English)
Opponent
Supervisors
Available from: 2006-04-28 Created: 2006-04-28 Last updated: 2022-03-11Bibliographically approved
List of papers
1. Miss Penalty Reduction Using Bundled Capacity Prefetching in Multiprocessors
Open this publication in new window or tab >>Miss Penalty Reduction Using Bundled Capacity Prefetching in Multiprocessors
2003 In: Proceedings of the International Parallel and Distributed Processing SymposiumArticle in journal (Refereed) Published
Identifiers
urn:nbn:se:uu:diva-94442 (URN)
Available from: 2006-04-28 Created: 2006-04-28Bibliographically approved
2. Bundling: Reducing the Overhead of Multiprocessor Prefetchers
Open this publication in new window or tab >>Bundling: Reducing the Overhead of Multiprocessor Prefetchers
2004 In: Proceedings of the International Parallel and Distributed Processing SymposiumArticle in journal (Refereed) Published
Identifiers
urn:nbn:se:uu:diva-94443 (URN)
Available from: 2006-04-28 Created: 2006-04-28Bibliographically approved
3. Cache memory behavior of advanced PDE solvers
Open this publication in new window or tab >>Cache memory behavior of advanced PDE solvers
2004 (English)In: Parallel Computing: Software Technology, Algorithms, Architectures and Applications, Amsterdam, The Netherlands: Elsevier , 2004, p. 475-482Conference paper, Published paper (Refereed)
Place, publisher, year, edition, pages
Amsterdam, The Netherlands: Elsevier, 2004
Series
Advances in Parallel Computing ; 13
National Category
Computer Sciences Computational Mathematics
Identifiers
urn:nbn:se:uu:diva-67857 (URN)0-444-51689-1 (ISBN)
Available from: 2006-05-17 Created: 2006-05-17 Last updated: 2018-01-10Bibliographically approved
4. Performance of PDE solvers on a self-optimizing NUMA architecture
Open this publication in new window or tab >>Performance of PDE solvers on a self-optimizing NUMA architecture
2002 (English)In: Parallel Algorithms and Applications, ISSN 1063-7192, E-ISSN 1029-032X, Vol. 17, p. 285-299Article in journal (Refereed) Published
National Category
Computer Sciences Computational Mathematics
Identifiers
urn:nbn:se:uu:diva-66909 (URN)10.1080/01495730208941445 (DOI)
Available from: 2006-05-22 Created: 2006-05-22 Last updated: 2018-01-10Bibliographically approved
5. Multigrid and Gauss-Seidel smoothers revisited: Parallelization on chip multiprocessors
Open this publication in new window or tab >>Multigrid and Gauss-Seidel smoothers revisited: Parallelization on chip multiprocessors
2006 (English)In: Proc. 20th ACM International Conference on Supercomputing, New York: ACM Press , 2006, p. 145-155Conference paper, Published paper (Refereed)
Place, publisher, year, edition, pages
New York: ACM Press, 2006
National Category
Computer Sciences Computational Mathematics
Identifiers
urn:nbn:se:uu:diva-19810 (URN)10.1145/1183401.1183423 (DOI)1-59593-282-8 (ISBN)
Available from: 2008-02-08 Created: 2008-02-08 Last updated: 2018-01-12Bibliographically approved
6. Vasa: A Simulator Infrastructure with Adjustable Fidelity
Open this publication in new window or tab >>Vasa: A Simulator Infrastructure with Adjustable Fidelity
2005 In: Proceedings of the International Conference on Parallel and Distibuted Computing and SystemsArticle in journal (Refereed) Published
Identifiers
urn:nbn:se:uu:diva-94447 (URN)
Available from: 2006-04-28 Created: 2006-04-28Bibliographically approved

Open Access in DiVA

fulltext(256 kB)1398 downloads
File information
File name FULLTEXT01.pdfFile size 256 kBChecksum SHA-1
7e484336121b919bfd885e5e7cb410512e22490e6991631a14ac135c603dbafcbfc97ef4
Type fulltextMimetype application/pdf

By organisation
Division of Computer SystemsComputer Systems
Computer Engineering

Search outside of DiVA

GoogleGoogle Scholar
Total: 1398 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 1916 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf