Logo: to the web site of Uppsala University

uu.sePublikasjoner fra Uppsala universitet
Endre søk
Link to record
Permanent link

Direct link
Wallin, Dan
Publikasjoner (7 av 7) Visa alla publikasjoner
Johansson, H., Wallin, D. & Holmgren, S. (2006). Analyzing advanced PDE solvers through simulation. In: Applied Parallel Computing: State of the Art in Scientific Computing (pp. 893-900). Berlin: Springer-Verlag
Åpne denne publikasjonen i ny fane eller vindu >>Analyzing advanced PDE solvers through simulation
2006 (engelsk)Inngår i: Applied Parallel Computing: State of the Art in Scientific Computing, Berlin: Springer-Verlag , 2006, s. 893-900Konferansepaper, Publicerat paper (Fagfellevurdert)
sted, utgiver, år, opplag, sider
Berlin: Springer-Verlag, 2006
Serie
Lecture Notes in Computer Science ; 3732
HSV kategori
Identifikatorer
urn:nbn:se:uu:diva-80673 (URN)10.1007/11558958_108 (DOI)000237003200108 ()
Tilgjengelig fra: 2008-03-07 Laget: 2008-03-07 Sist oppdatert: 2018-01-13bibliografisk kontrollert
Wallin, D., Löf, H., Hagersten, E. & Holmgren, S. (2006). Multigrid and Gauss-Seidel smoothers revisited: Parallelization on chip multiprocessors. In: Proc. 20th ACM International Conference on Supercomputing (pp. 145-155). New York: ACM Press
Åpne denne publikasjonen i ny fane eller vindu >>Multigrid and Gauss-Seidel smoothers revisited: Parallelization on chip multiprocessors
2006 (engelsk)Inngår i: Proc. 20th ACM International Conference on Supercomputing, New York: ACM Press , 2006, s. 145-155Konferansepaper, Publicerat paper (Fagfellevurdert)
sted, utgiver, år, opplag, sider
New York: ACM Press, 2006
HSV kategori
Identifikatorer
urn:nbn:se:uu:diva-19810 (URN)10.1145/1183401.1183423 (DOI)1-59593-282-8 (ISBN)
Tilgjengelig fra: 2008-02-08 Laget: 2008-02-08 Sist oppdatert: 2018-01-12bibliografisk kontrollert
Wallin, D. & Hagersten, E. (2004). Bundling: Reducing the Overhead of Multiprocessor Prefetchers. In: 18th International Parallel and Distributed Processing Symposium: (IPDPS 2004).
Åpne denne publikasjonen i ny fane eller vindu >>Bundling: Reducing the Overhead of Multiprocessor Prefetchers
2004 (engelsk)Inngår i: 18th International Parallel and Distributed Processing Symposium: (IPDPS 2004), 2004Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

Prefetching has proven to be a useful technique for reducing cache misses in multiprocessors at the cost of increased coherence traffic. This is especially troublesome for snoop-based systems, where the available coherence bandwidth often is the scalability bottleneck. The bundling technique presented in this paper reduces the overhead caused by prefetching in two ways: piggybacking prefetches with normal requests, and requiring only one device to perform the snoop lookup for each prefetch transaction. This can reduce both the address bandwidth and the number of snoop lookups compared with a nonprefetching system. We describe bundling implementations for two important transaction types: reads and upgrades. While bundling could reduce the overhead of most existing prefetch schemes, the evaluation of bundling performed in this paper has been limited to two of them: sequential prefetching and Dahlgren´s adaptive sequential prefetching. Both schemes have their snoop bandwidth halved for all commercial and scientific benchmarks in the study. The combined effect of bundling applied to these prefetch schemes lowers the cache miss rate, the address bandwidth and the snoop bandwidth, compared with a system with no prefetching, for all applications. Bundling, will not reduce the data bandwidth introduced by a prefetch scheme. However, we argue that the data bandwidth is more easily scaled than the snoop bandwidth for snoop-based coherence systems.

Available as PDF (693 kB)

Identifikatorer
urn:nbn:se:uu:diva-72530 (URN)
Tilgjengelig fra: 2005-05-25 Laget: 2005-05-25
Wallin, D., Johansson, H. & Holmgren, S. (2004). Cache memory behavior of advanced PDE solvers. In: Parallel Computing: Software Technology, Algorithms, Architectures and Applications (pp. 475-482). Amsterdam, The Netherlands: Elsevier
Åpne denne publikasjonen i ny fane eller vindu >>Cache memory behavior of advanced PDE solvers
2004 (engelsk)Inngår i: Parallel Computing: Software Technology, Algorithms, Architectures and Applications, Amsterdam, The Netherlands: Elsevier , 2004, s. 475-482Konferansepaper, Publicerat paper (Fagfellevurdert)
sted, utgiver, år, opplag, sider
Amsterdam, The Netherlands: Elsevier, 2004
Serie
Advances in Parallel Computing ; 13
HSV kategori
Identifikatorer
urn:nbn:se:uu:diva-67857 (URN)0-444-51689-1 (ISBN)
Tilgjengelig fra: 2006-05-17 Laget: 2006-05-17 Sist oppdatert: 2018-01-10bibliografisk kontrollert
Wallin, D. (2003). Exploiting data locality in adaptive architectures. (Licentiate dissertation). Uppsala University
Åpne denne publikasjonen i ny fane eller vindu >>Exploiting data locality in adaptive architectures
2003 (engelsk)Licentiatavhandling, med artikler (Annet vitenskapelig)
Abstract [en]

The speed of processors increases much faster than the memory access time. This makes memory accesses expensive. To meet this problem, cache hierarchies are introduced to serve the processor with data. However, the effectiveness of caches depends on the amount of locality in the application's memory access pattern. The behavior of various programs differs greatly in terms of cache miss characteristics, access patterns and communication intensity. Therefore a computer built for many different computational tasks potentially benefits from dynamically adapting to the varying needs of the applications.

This thesis shows that a cc-NUMA multiprocessor with data migration and replication optimizations efficiently exploits the temporal locality of algorithms. The performance of the self-optimizing system is similar to a system with a perfect initial thread and data placement.

Data locality optimizations are not for free. Large cache line coherence protocols improve spatial locality but yield increases in false sharing misses for many applications. Prefetching techniques that reduce the cache misses often lead to increased address and data traffic. Several techniques introduced in this thesis efficiently avoid these drawbacks. The bundling technique reduces the coherence traffic in multiprocessor prefetchers. This is especially important in snoop-based systems where the coherence bandwidth is a scarce resource. Bundled prefetchers manage to reduce both the cache miss rate and the coherence traffic compared with non-prefetching protocols. The most efficient bundled prefetching protocol studied, lowers the cache misses by 27 percent and the address snoops by 24 percent relative to a non-prefetching protocol on average for all examined applications. Another proposed technique, capacity prefetching, avoids false sharing misses by distinguishing between cache lines involved in communication from non-communicating cache lines at run-time.

sted, utgiver, år, opplag, sider
Uppsala University, 2003
Serie
IT licentiate theses / Uppsala University, Department of Information Technology, ISSN 1404-5117 ; 2003-010
HSV kategori
Forskningsprogram
Datorteknik
Identifikatorer
urn:nbn:se:uu:diva-86160 (URN)
Veileder
Tilgjengelig fra: 2003-09-26 Laget: 2006-12-27 Sist oppdatert: 2018-01-13bibliografisk kontrollert
Holmgren, S., Nordén, M., Rantakokko, J. & Wallin, D. (2002). Performance of PDE solvers on a self-optimizing NUMA architecture. Parallel Algorithms and Applications, 17, 285-299
Åpne denne publikasjonen i ny fane eller vindu >>Performance of PDE solvers on a self-optimizing NUMA architecture
2002 (engelsk)Inngår i: Parallel Algorithms and Applications, ISSN 1063-7192, E-ISSN 1029-032X, Vol. 17, s. 285-299Artikkel i tidsskrift (Fagfellevurdert) Published
HSV kategori
Identifikatorer
urn:nbn:se:uu:diva-66909 (URN)10.1080/01495730208941445 (DOI)
Tilgjengelig fra: 2006-05-22 Laget: 2006-05-22 Sist oppdatert: 2018-01-10bibliografisk kontrollert
Holmgren, S. & Wallin, D. (2001). Performance of high-accuracy PDE solvers on a self-optimizing NUMA architecture. In: Euro-Par 2001: Parallel Processing (pp. 602-610). Berlin: Springer-Verlag
Åpne denne publikasjonen i ny fane eller vindu >>Performance of high-accuracy PDE solvers on a self-optimizing NUMA architecture
2001 (engelsk)Inngår i: Euro-Par 2001: Parallel Processing, Berlin: Springer-Verlag , 2001, s. 602-610Konferansepaper, Publicerat paper (Fagfellevurdert)
sted, utgiver, år, opplag, sider
Berlin: Springer-Verlag, 2001
Serie
Lecture Notes in Computer Science ; 2150
HSV kategori
Identifikatorer
urn:nbn:se:uu:diva-40590 (URN)10.1007/3-540-44681-8_86 (DOI)
Tilgjengelig fra: 2008-03-13 Laget: 2008-03-13 Sist oppdatert: 2018-01-11bibliografisk kontrollert
Organisasjoner