Logotyp: till Uppsala universitets webbplats

uu.sePublikationer från Uppsala universitet
Ändra sökning
Länk till posten
Permanent länk

Direktlänk
Wallin, Dan
Publikationer (7 of 7) Visa alla publikationer
Johansson, H., Wallin, D. & Holmgren, S. (2006). Analyzing advanced PDE solvers through simulation. In: Applied Parallel Computing: State of the Art in Scientific Computing (pp. 893-900). Berlin: Springer-Verlag
Öppna denna publikation i ny flik eller fönster >>Analyzing advanced PDE solvers through simulation
2006 (Engelska)Ingår i: Applied Parallel Computing: State of the Art in Scientific Computing, Berlin: Springer-Verlag , 2006, s. 893-900Konferensbidrag, Publicerat paper (Refereegranskat)
Ort, förlag, år, upplaga, sidor
Berlin: Springer-Verlag, 2006
Serie
Lecture Notes in Computer Science ; 3732
Nationell ämneskategori
Datavetenskap (datalogi) Beräkningsmatematik
Identifikatorer
urn:nbn:se:uu:diva-80673 (URN)10.1007/11558958_108 (DOI)000237003200108 ()
Tillgänglig från: 2008-03-07 Skapad: 2008-03-07 Senast uppdaterad: 2018-01-13Bibliografiskt granskad
Wallin, D., Löf, H., Hagersten, E. & Holmgren, S. (2006). Multigrid and Gauss-Seidel smoothers revisited: Parallelization on chip multiprocessors. In: Proc. 20th ACM International Conference on Supercomputing (pp. 145-155). New York: ACM Press
Öppna denna publikation i ny flik eller fönster >>Multigrid and Gauss-Seidel smoothers revisited: Parallelization on chip multiprocessors
2006 (Engelska)Ingår i: Proc. 20th ACM International Conference on Supercomputing, New York: ACM Press , 2006, s. 145-155Konferensbidrag, Publicerat paper (Refereegranskat)
Ort, förlag, år, upplaga, sidor
New York: ACM Press, 2006
Nationell ämneskategori
Datavetenskap (datalogi) Beräkningsmatematik
Identifikatorer
urn:nbn:se:uu:diva-19810 (URN)10.1145/1183401.1183423 (DOI)1-59593-282-8 (ISBN)
Tillgänglig från: 2008-02-08 Skapad: 2008-02-08 Senast uppdaterad: 2018-01-12Bibliografiskt granskad
Wallin, D. & Hagersten, E. (2004). Bundling: Reducing the Overhead of Multiprocessor Prefetchers. In: 18th International Parallel and Distributed Processing Symposium: (IPDPS 2004).
Öppna denna publikation i ny flik eller fönster >>Bundling: Reducing the Overhead of Multiprocessor Prefetchers
2004 (Engelska)Ingår i: 18th International Parallel and Distributed Processing Symposium: (IPDPS 2004), 2004Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

Prefetching has proven to be a useful technique for reducing cache misses in multiprocessors at the cost of increased coherence traffic. This is especially troublesome for snoop-based systems, where the available coherence bandwidth often is the scalability bottleneck. The bundling technique presented in this paper reduces the overhead caused by prefetching in two ways: piggybacking prefetches with normal requests, and requiring only one device to perform the snoop lookup for each prefetch transaction. This can reduce both the address bandwidth and the number of snoop lookups compared with a nonprefetching system. We describe bundling implementations for two important transaction types: reads and upgrades. While bundling could reduce the overhead of most existing prefetch schemes, the evaluation of bundling performed in this paper has been limited to two of them: sequential prefetching and Dahlgren´s adaptive sequential prefetching. Both schemes have their snoop bandwidth halved for all commercial and scientific benchmarks in the study. The combined effect of bundling applied to these prefetch schemes lowers the cache miss rate, the address bandwidth and the snoop bandwidth, compared with a system with no prefetching, for all applications. Bundling, will not reduce the data bandwidth introduced by a prefetch scheme. However, we argue that the data bandwidth is more easily scaled than the snoop bandwidth for snoop-based coherence systems.

Available as PDF (693 kB)

Identifikatorer
urn:nbn:se:uu:diva-72530 (URN)
Tillgänglig från: 2005-05-25 Skapad: 2005-05-25
Wallin, D., Johansson, H. & Holmgren, S. (2004). Cache memory behavior of advanced PDE solvers. In: Parallel Computing: Software Technology, Algorithms, Architectures and Applications (pp. 475-482). Amsterdam, The Netherlands: Elsevier
Öppna denna publikation i ny flik eller fönster >>Cache memory behavior of advanced PDE solvers
2004 (Engelska)Ingår i: Parallel Computing: Software Technology, Algorithms, Architectures and Applications, Amsterdam, The Netherlands: Elsevier , 2004, s. 475-482Konferensbidrag, Publicerat paper (Refereegranskat)
Ort, förlag, år, upplaga, sidor
Amsterdam, The Netherlands: Elsevier, 2004
Serie
Advances in Parallel Computing ; 13
Nationell ämneskategori
Datavetenskap (datalogi) Beräkningsmatematik
Identifikatorer
urn:nbn:se:uu:diva-67857 (URN)0-444-51689-1 (ISBN)
Tillgänglig från: 2006-05-17 Skapad: 2006-05-17 Senast uppdaterad: 2018-01-10Bibliografiskt granskad
Wallin, D. (2003). Exploiting data locality in adaptive architectures. (Licentiate dissertation). Uppsala University
Öppna denna publikation i ny flik eller fönster >>Exploiting data locality in adaptive architectures
2003 (Engelska)Licentiatavhandling, sammanläggning (Övrigt vetenskapligt)
Abstract [en]

The speed of processors increases much faster than the memory access time. This makes memory accesses expensive. To meet this problem, cache hierarchies are introduced to serve the processor with data. However, the effectiveness of caches depends on the amount of locality in the application's memory access pattern. The behavior of various programs differs greatly in terms of cache miss characteristics, access patterns and communication intensity. Therefore a computer built for many different computational tasks potentially benefits from dynamically adapting to the varying needs of the applications.

This thesis shows that a cc-NUMA multiprocessor with data migration and replication optimizations efficiently exploits the temporal locality of algorithms. The performance of the self-optimizing system is similar to a system with a perfect initial thread and data placement.

Data locality optimizations are not for free. Large cache line coherence protocols improve spatial locality but yield increases in false sharing misses for many applications. Prefetching techniques that reduce the cache misses often lead to increased address and data traffic. Several techniques introduced in this thesis efficiently avoid these drawbacks. The bundling technique reduces the coherence traffic in multiprocessor prefetchers. This is especially important in snoop-based systems where the coherence bandwidth is a scarce resource. Bundled prefetchers manage to reduce both the cache miss rate and the coherence traffic compared with non-prefetching protocols. The most efficient bundled prefetching protocol studied, lowers the cache misses by 27 percent and the address snoops by 24 percent relative to a non-prefetching protocol on average for all examined applications. Another proposed technique, capacity prefetching, avoids false sharing misses by distinguishing between cache lines involved in communication from non-communicating cache lines at run-time.

Ort, förlag, år, upplaga, sidor
Uppsala University, 2003
Serie
IT licentiate theses / Uppsala University, Department of Information Technology, ISSN 1404-5117 ; 2003-010
Nationell ämneskategori
Datorteknik
Forskningsämne
Datorteknik
Identifikatorer
urn:nbn:se:uu:diva-86160 (URN)
Handledare
Tillgänglig från: 2003-09-26 Skapad: 2006-12-27 Senast uppdaterad: 2018-01-13Bibliografiskt granskad
Holmgren, S., Nordén, M., Rantakokko, J. & Wallin, D. (2002). Performance of PDE solvers on a self-optimizing NUMA architecture. Parallel Algorithms and Applications, 17, 285-299
Öppna denna publikation i ny flik eller fönster >>Performance of PDE solvers on a self-optimizing NUMA architecture
2002 (Engelska)Ingår i: Parallel Algorithms and Applications, ISSN 1063-7192, E-ISSN 1029-032X, Vol. 17, s. 285-299Artikel i tidskrift (Refereegranskat) Published
Nationell ämneskategori
Datavetenskap (datalogi) Beräkningsmatematik
Identifikatorer
urn:nbn:se:uu:diva-66909 (URN)10.1080/01495730208941445 (DOI)
Tillgänglig från: 2006-05-22 Skapad: 2006-05-22 Senast uppdaterad: 2018-01-10Bibliografiskt granskad
Holmgren, S. & Wallin, D. (2001). Performance of high-accuracy PDE solvers on a self-optimizing NUMA architecture. In: Euro-Par 2001: Parallel Processing (pp. 602-610). Berlin: Springer-Verlag
Öppna denna publikation i ny flik eller fönster >>Performance of high-accuracy PDE solvers on a self-optimizing NUMA architecture
2001 (Engelska)Ingår i: Euro-Par 2001: Parallel Processing, Berlin: Springer-Verlag , 2001, s. 602-610Konferensbidrag, Publicerat paper (Refereegranskat)
Ort, förlag, år, upplaga, sidor
Berlin: Springer-Verlag, 2001
Serie
Lecture Notes in Computer Science ; 2150
Nationell ämneskategori
Datavetenskap (datalogi) Beräkningsmatematik
Identifikatorer
urn:nbn:se:uu:diva-40590 (URN)10.1007/3-540-44681-8_86 (DOI)
Tillgänglig från: 2008-03-13 Skapad: 2008-03-13 Senast uppdaterad: 2018-01-11Bibliografiskt granskad
Organisationer

Sök vidare i DiVA

Visa alla publikationer