uu.seUppsala University Publications
Change search
Link to record
Permanent link

Direct link
BETA
Hagersten, Erik
Alternative names
Publications (10 of 136) Show all publications
Nikoleris, N., Hagersten, E. & Carlson, T. E. (2018). Delorean: Virtualized Directed Profiling for Cache Modeling in Sampled Simulation.
Open this publication in new window or tab >>Delorean: Virtualized Directed Profiling for Cache Modeling in Sampled Simulation
2018 (English)Report (Other academic)
Abstract [en]

Current practice for accurate and efficient simulation (e.g., SMARTS and Simpoint) makes use of sampling to significantly reduce the time needed to evaluate new research ideas. By evaluating a small but representative portion of the original application, sampling can allow for both fast and accurate performance analysis. However, as cache sizes of modern architectures grow, simulation time is dominated by warming microarchitectural state and not by detailed simulation, reducing overall simulation efficiency. While checkpoints can significantly reduce cache warming, improving efficiency, they limit the flexibility of the system under evaluation, requiring new checkpoints for software updates (such as changes to the compiler and compiler flags) and many types of hardware modifications. An ideal solution would allow for accurate cache modeling for each simulation run without the need to generate rigid checkpointing data a priori.

Enabling this new direction for fast and flexible simulation requires a combination of (1) a methodology that allows for hardware and software flexibility and (2) the ability to quickly and accurately model arbitrarily-sized caches. Current approaches that rely on checkpointing or statistical cache modeling require rigid, up-front state to be collected which needs to be amortized over a large number of simulation runs. These earlier methodologies are insufficient for our goals for improved flexibility. In contrast, our proposed methodology, Delorean, outlines a unique solution to this problem. The Delorean simulation methodology enables both flexibility and accuracy by quickly generating a targeted cache model for the next detailed region on the fly without the need for up-front simulation or modeling. More specifically, we propose a new, more accurate statistical cache modeling method that takes advantage of hardware virtualization to precisely determine the memory regions accessed and to minimize the time needed for data collection while maintaining accuracy.

Delorean uses a multi-pass approach to understand the memory regions accessed by the next, upcoming detailed region. Our methodology collects the entire set of key memory accesses and, through fast virtualization techniques, progressively scans larger, earlier regions to learn more about these key accesses in an efficient way. Using these techniques, we demonstrate that Delorean allows for the fast evaluation of systems and their software though the generation of accurate cache models on the fly. Delorean outperforms previous proposals by an order of magnitude, with a simulation speed of 150 MIPS and a similar average CPI error (below 4%).

Publisher
p. 12
Series
Technical report / Department of Information Technology, Uppsala University, ISSN 1404-3203
National Category
Computer Systems
Research subject
Computer Science
Identifiers
urn:nbn:se:uu:diva-369320 (URN)
Available from: 2018-12-12 Created: 2018-12-12 Last updated: 2019-01-08Bibliographically approved
Ceballos, G., Hagersten, E. & Black-Schaffer, D. (2018). Tail-PASS: Resource-based Cache Management for Tiled Graphics Rendering Hardware. In: Proc. 16th International Conference on Parallel and Distributed Processing with Applications: . Paper presented at ISPA 2018, December 11–13, Melbourne, Australia. IEEE
Open this publication in new window or tab >>Tail-PASS: Resource-based Cache Management for Tiled Graphics Rendering Hardware
2018 (English)In: Proc. 16th International Conference on Parallel and Distributed Processing with Applications, IEEE, 2018Conference paper, Published paper (Refereed)
Place, publisher, year, edition, pages
IEEE, 2018
National Category
Computer Systems Computer Sciences
Identifiers
urn:nbn:se:uu:diva-363920 (URN)
Conference
ISPA 2018, December 11–13, Melbourne, Australia
Funder
EU, European Research Council, 715283
Available from: 2018-10-21 Created: 2018-10-21 Last updated: 2018-11-16Bibliographically approved
Sembrant, A., Carlson, T. E., Hagersten, E. & Black-Schaffer, D. (2017). A graphics tracing framework for exploring CPU+GPU memory systems. In: Proc. 20th International Symposium on Workload Characterization: . Paper presented at IISWC 2017, October 1–3, Seattle, WA (pp. 54-65). IEEE
Open this publication in new window or tab >>A graphics tracing framework for exploring CPU+GPU memory systems
2017 (English)In: Proc. 20th International Symposium on Workload Characterization, IEEE, 2017, p. 54-65Conference paper, Published paper (Refereed)
Place, publisher, year, edition, pages
IEEE, 2017
National Category
Computer Engineering
Identifiers
urn:nbn:se:uu:diva-357055 (URN)10.1109/IISWC.2017.8167756 (DOI)000428206700006 ()978-1-5386-1233-0 (ISBN)
Conference
IISWC 2017, October 1–3, Seattle, WA
Available from: 2017-12-07 Created: 2018-08-17 Last updated: 2018-09-24Bibliographically approved
Sembrant, A., Hagersten, E. & Black-Schaffer, D. (2017). A split cache hierarchy for enabling data-oriented optimizations. In: Proc. 23rd International Symposium on High Performance Computer Architecture: . Paper presented at HPCA 2017, February 4–8, Austin, TX (pp. 133-144). IEEE Computer Society
Open this publication in new window or tab >>A split cache hierarchy for enabling data-oriented optimizations
2017 (English)In: Proc. 23rd International Symposium on High Performance Computer Architecture, IEEE Computer Society, 2017, p. 133-144Conference paper, Published paper (Refereed)
Place, publisher, year, edition, pages
IEEE Computer Society, 2017
National Category
Computer Engineering
Identifiers
urn:nbn:se:uu:diva-306368 (URN)10.1109/HPCA.2017.25 (DOI)000403330300012 ()978-1-5090-4985-1 (ISBN)
Conference
HPCA 2017, February 4–8, Austin, TX
Available from: 2017-05-08 Created: 2016-10-27 Last updated: 2018-01-14Bibliographically approved
Ceballos, G., Hugo, A., Hagersten, E. & Black-Schaffer, D. (2017). Exploring scheduling effects on task performance with TaskInsight. Supercomputing frontiers and innovations, 4(3), 91-98
Open this publication in new window or tab >>Exploring scheduling effects on task performance with TaskInsight
2017 (English)In: Supercomputing frontiers and innovations, ISSN 2214-3270, E-ISSN 2313-8734, Vol. 4, no 3, p. 91-98Article in journal (Refereed) Published
National Category
Computer Systems
Identifiers
urn:nbn:se:uu:diva-335528 (URN)10.14529/jsfi170306 (DOI)
Projects
UPMARC
Funder
Swedish Foundation for Strategic Research , FFL12-0051
Available from: 2017-12-06 Created: 2017-12-06 Last updated: 2018-11-16Bibliographically approved
Sembrant, A., Carlson, T. E., Hagersten, E. & Black-Schaffer, D. (2017). POSTER: Putting the G back into GPU/CPU Systems Research. In: 2017 26TH INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES (PACT): . Paper presented at 26th International Conference on Parallel Architectures and Compilation Techniques (PACT), SEP 09-13, 2017, Portland, OR, USA. (pp. 130-131).
Open this publication in new window or tab >>POSTER: Putting the G back into GPU/CPU Systems Research
2017 (English)In: 2017 26TH INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES (PACT), 2017, p. 130-131Conference paper, Published paper (Refereed)
Abstract [en]

Modern SoCs contain several CPU cores and many GPU cores to execute both general purpose and highly-parallel graphics workloads. In many SoCs, more area is dedicated to graphics than to general purpose compute. Despite this, the micro-architecture research community primarily focuses on GPGPU and CPU-only research, and not on graphics (the primary workload for many SoCs). The main reason for this is the lack of efficient tools and simulators for modern graphics applications. This work focuses on the GPU's memory traffic generated by graphics. We describe a new graphics tracing framework and use it to both study graphics applications' memory behavior as well as how CPUs and GPUs affect system performance. Our results show that graphics applications exhibit a wide range of memory behavior between applications and across time, and slows down co-running SPEC applications by 59% on average.

Series
International Conference on Parallel Architectures and Compilation Techniques, ISSN 1089-795X
National Category
Computer Systems Computer Engineering
Identifiers
urn:nbn:se:uu:diva-347752 (URN)10.1109/PACT.2017.60 (DOI)000417411300011 ()978-1-5090-6764-0 (ISBN)
Conference
26th International Conference on Parallel Architectures and Compilation Techniques (PACT), SEP 09-13, 2017, Portland, OR, USA.
Available from: 2018-04-17 Created: 2018-04-17 Last updated: 2018-04-17Bibliographically approved
Davari, M., Hagersten, E. & Kaxiras, S. (2017). Scope-Aware Classification: Taking the hierarchical private/shared data classification to the next level.
Open this publication in new window or tab >>Scope-Aware Classification: Taking the hierarchical private/shared data classification to the next level
2017 (English)Report (Other academic)
Series
Technical report / Department of Information Technology, Uppsala University, ISSN 1404-3203 ; 2017-008
National Category
Computer Systems
Identifiers
urn:nbn:se:uu:diva-320324 (URN)
Available from: 2017-04-27 Created: 2017-04-19 Last updated: 2017-07-03Bibliographically approved
Davari, M., Hagersten, E. & Kaxiras, S. (2017). The best of both works: A hybrid data-race-free cache coherence scheme.
Open this publication in new window or tab >>The best of both works: A hybrid data-race-free cache coherence scheme
2017 (English)Report (Other academic)
National Category
Computer Systems
Identifiers
urn:nbn:se:uu:diva-320320 (URN)
Available from: 2017-04-19 Created: 2017-04-19 Last updated: 2017-11-15
Ceballos, G., Hagersten, E. & Black-Schaffer, D. (2017). Understanding the interplay between task scheduling, memory and performance. In: Proc. Companion 8th ACM International Conference on Systems, Programming, Languages, and Applications: Software for Humanity. Paper presented at SPLASH 2017, October 22–27, Vancouver, Canada (pp. 21-23). New York: ACM Press
Open this publication in new window or tab >>Understanding the interplay between task scheduling, memory and performance
2017 (English)In: Proc. Companion 8th ACM International Conference on Systems, Programming, Languages, and Applications: Software for Humanity, New York: ACM Press, 2017, p. 21-23Conference paper, Published paper (Refereed)
Place, publisher, year, edition, pages
New York: ACM Press, 2017
National Category
Computer Systems
Identifiers
urn:nbn:se:uu:diva-335556 (URN)10.1145/3135932.3135942 (DOI)978-1-4503-5514-8 (ISBN)
Conference
SPLASH 2017, October 22–27, Vancouver, Canada
Projects
UPMARC
Funder
Swedish Foundation for Strategic Research , FFL12-0051
Available from: 2017-10-22 Created: 2017-12-06 Last updated: 2018-11-16Bibliographically approved
Spiliopoulos, V., Sembrant, A., Keramidas, G., Hagersten, E. & Kaxiras, S. (2016). A unified DVFS-cache resizing framework.
Open this publication in new window or tab >>A unified DVFS-cache resizing framework
Show others...
2016 (English)Report (Other academic)
Series
Technical report / Department of Information Technology, Uppsala University, ISSN 1404-3203 ; 2016-014
National Category
Computer Sciences
Identifiers
urn:nbn:se:uu:diva-300840 (URN)
Available from: 2016-08-15 Created: 2016-08-15 Last updated: 2018-01-10Bibliographically approved
Organisations

Search in DiVA

Show all publications