Logo: to the web site of Uppsala University

uu.sePublications from Uppsala University
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
A Reusable Characterization of the Memory System Behavior of SPEC2017 and SPEC2006
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Computer Systems.
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Computer Systems.ORCID iD: 0000-0002-8250-8574
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Computer Systems.ORCID iD: 0000-0001-5375-4058
2021 (English)In: ACM Transactions on Architecture and Code Optimization (TACO), ISSN 1544-3566, E-ISSN 1544-3973, Vol. 18, no 2, article id 24Article in journal (Refereed) Published
Abstract [en]

The SPEC CPU Benchmarks are used extensively for evaluating and comparing improvements to computer systems. This ubiquity makes characterization critical for researchers to understand the bottlenecks the benchmarks do and do not expose and where new designs should and should not be expected to show impact. However, in characterization there is a tradeoff between accuracy and reusability: The more precisely we characterize a benchmark's performance on a given system, the less usable it is across different micro-architectures and varying memory configurations. For SPEC, most existing characterizations include system-specific effects (e.g., via performance counters) and/or only look at aggregate behavior (e.g., averages over the full application execution). While such approaches simplify characterization, they make it difficult to separate the applications' intrinsic behavior from the system-specific effects and/or lose the diverse phase-based behaviors. In this work we focus on characterizing the applications' intrinsic memory behaviour by isolating them from micro-architectural configuration specifics. We do this by providing a simplified generic system model that evaluates the applications' memory behavior across multiple cache sizes, with and without prefetching, and over time. The resulting characterization can be reused across a range of systems to understand application behavior and allow us to see how frequently different behaviors occur. We use this approach to compare the SPEC 2006 and 2017 suites, providing insight into their memory system behaviour beyond previous system-specific and/or aggregate results. We demonstrate the ability to use this characterization in different contexts by showing a portion of the SPEC 2017 benchmark suite that could benefit from giga-scale caches, despite aggregate results indicating otherwise.

Place, publisher, year, edition, pages
ASSOC COMPUTING MACHINERY Association for Computing Machinery (ACM), 2021. Vol. 18, no 2, article id 24
Keywords [en]
Memory systems, cache sensitivity, prefetcher sensitivity, benchmark characterization, workload characterization, memory system characterization
National Category
Computer Systems Computer Engineering
Identifiers
URN: urn:nbn:se:uu:diva-442105DOI: 10.1145/3446200ISI: 000631098200008OAI: oai:DiVA.org:uu-442105DiVA, id: diva2:1553717
Funder
EU, European Research Council, 715283Knut and Alice Wallenberg Foundation, 2015.0153Available from: 2021-05-10 Created: 2021-05-10 Last updated: 2024-04-02Bibliographically approved
In thesis
1. Enhancing Processor Performance: Approaches for Memory Characterization, Efficient Dynamic Instruction Prefetching, and Optimized Instruction Caching
Open this publication in new window or tab >>Enhancing Processor Performance: Approaches for Memory Characterization, Efficient Dynamic Instruction Prefetching, and Optimized Instruction Caching
2024 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Low latency access to both data and instructions is paramount for processor performance. However, memory speed has been trailing behind the processor speed and is now a dominant bottleneck in execution. While both data and instruction misses cause performance losses, data misses can be overlapped with other useful work, but instruction misses stall the front-end of the processor leading to greater performance loss than data misses.

Memory access characterization is important for designing memory hierarchies. While many works have characterised SPEC benchmark's memory behaviour, the results have been either tied to a specific micro-architecture or ignored the time-based behaviour of the benchmarks. In this thesis, we remove a majority of the micro-architectural features to characterize the intrinsic memory behaviour of the SPEC benchmarks and use this to understand how the workloads behave with various cache sizes and prefetching. In order to simplify the analysis of complex time-based results, we propose the use of MPKI Bins which divide the execution into distinct MPKI ranges. Using MPKI bins, we demonstrate that short memory-bound phases cause a significant percentage of the overall cache misses. 

For instructions, the growing instruction footprints of server workloads are causing significant performance losses due to front-end stalls that cannot be overlapped or hidden by out-of-order execution. The second part of this thesis develops a technique to enable dedicated instruction prefetchers without the area cost of separate metadata storage structures. We propose to re-purpose the branch target buffer (BTB) to store prefetcher metadata based on the insight that benchmarks that require a dedicated instruction prefetcher can tolerate increased BTB misses. Going further, we propose L2 instruction bypassing based on the insight that decreased L2 data misses deliver more benefit then the slight instruction latency reduction of having instructions in the L2. We show that L2 instruction bypass delivers more performance than a dedicated instruction prefetcher and instruction focused replacement policies. 

Place, publisher, year, edition, pages
Uppsala: Acta Universitatis Upsaliensis, 2024. p. 54
Series
Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology, ISSN 1651-6214 ; 2387
Keywords
Computer Architecture, Memory Systems, Server Design, Caches
National Category
Computer Systems
Research subject
Computer Science
Identifiers
urn:nbn:se:uu:diva-525875 (URN)978-91-513-2096-0 (ISBN)
Public defence
2024-05-31, 80127, Ångströmslaboratoriet, Lägerhyddsvägen 1, Uppsala, 13:00 (English)
Opponent
Supervisors
Available from: 2024-05-06 Created: 2024-04-02 Last updated: 2024-05-06

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full text

Authority records

Hassan, MuhammadPark, Chang HyunBlack-Schaffer, David

Search in DiVA

By author/editor
Hassan, MuhammadPark, Chang HyunBlack-Schaffer, David
By organisation
Computer Architecture and Computer CommunicationDivision of Computer Systems
In the same journal
ACM Transactions on Architecture and Code Optimization (TACO)
Computer SystemsComputer Engineering

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 245 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf