Logo: to the web site of Uppsala University

uu.sePublications from Uppsala University
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Protean: Resource-efficient Instruction Prefetching
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Computer Systems. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Computer Systems. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.ORCID iD: 0000-0002-8250-8574
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Computer Systems. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.ORCID iD: 0000-0001-5375-4058
2023 (English)In: The International Symposium on Memory Systems (MEMSYS '23), Association for Computing Machinery (ACM), 2023Conference paper, Published paper (Refereed)
Abstract [en]

Increases in code footprint and control flow complexity have made low-latency instruction fetch challenging. Dedicated Instruction Prefetchers (DIPs) can provide performance gains (up to 5%) for a subset of applications that are poorly served by today’s ubiquitous Fetch-Directed Instruction Prefetching (FDIP). However, DIPs incur the significant overhead of in-core metadata storage (for all work- loads) and energy and performance loss from excess prefetches (for many workloads), leading to 11% of workloads actually losing performance. This work addresses how to provide the benefits of a DIP without its costs when the DIP cannot provide a benefit.

Our key insight is that workloads that benefit from DIPs can tolerate increased Branch Target Buffer (BTB) misses. This allows us to dynamically re-purpose the existing BTB storage between the BTB and the DIP. We train a simple performance counter based decision tree to select the optimal configuration at runtime, which allows us to achieve different energy/performance optimization goals. As a result, we pay essentially no area overhead when a DIP is needed, and can use the larger BTB when it is beneficial, or even power it off when not needed.

We look at our impact on two groups of benchmarks: those where the right configuration choice can improve performance or energy and those where the wrong choice could hurt them. For the benchmarks with improvement potential, when optimizing for performance, we are able to obtain 86% of the oracle potential, and when optimizing for energy, 98% of the potential, both while avoid- ing essentially all performance and energy losses on the remaining benchmarks. This demonstrates that our technique is able to dy- namically adapt to different performance/energy goals and obtain essentially all of the potential gains of DIP without the overheads they experience today.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2023.
National Category
Computer Systems
Research subject
Computer Science; Computer Systems Sciences
Identifiers
URN: urn:nbn:se:uu:diva-515499DOI: 10.1145/3631882.3631904OAI: oai:DiVA.org:uu-515499DiVA, id: diva2:1809540
Conference
The International Symposium on Memory Systems (MEMSYS '23), October 2–5, 2023, Alexandria, VA, USA
Funder
Knut and Alice Wallenberg Foundation, 2015.0153EU, Horizon 2020, 715283Swedish Research Council, 2019-02429
Note

Funder

Electronics and Telecommunications Research Institute (ETRI)

Grant number: 23ZS1300

Available from: 2023-11-03 Created: 2023-11-03 Last updated: 2024-04-02
In thesis
1. Enhancing Processor Performance: Approaches for Memory Characterization, Efficient Dynamic Instruction Prefetching, and Optimized Instruction Caching
Open this publication in new window or tab >>Enhancing Processor Performance: Approaches for Memory Characterization, Efficient Dynamic Instruction Prefetching, and Optimized Instruction Caching
2024 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Low latency access to both data and instructions is paramount for processor performance. However, memory speed has been trailing behind the processor speed and is now a dominant bottleneck in execution. While both data and instruction misses cause performance losses, data misses can be overlapped with other useful work, but instruction misses stall the front-end of the processor leading to greater performance loss than data misses.

Memory access characterization is important for designing memory hierarchies. While many works have characterised SPEC benchmark's memory behaviour, the results have been either tied to a specific micro-architecture or ignored the time-based behaviour of the benchmarks. In this thesis, we remove a majority of the micro-architectural features to characterize the intrinsic memory behaviour of the SPEC benchmarks and use this to understand how the workloads behave with various cache sizes and prefetching. In order to simplify the analysis of complex time-based results, we propose the use of MPKI Bins which divide the execution into distinct MPKI ranges. Using MPKI bins, we demonstrate that short memory-bound phases cause a significant percentage of the overall cache misses. 

For instructions, the growing instruction footprints of server workloads are causing significant performance losses due to front-end stalls that cannot be overlapped or hidden by out-of-order execution. The second part of this thesis develops a technique to enable dedicated instruction prefetchers without the area cost of separate metadata storage structures. We propose to re-purpose the branch target buffer (BTB) to store prefetcher metadata based on the insight that benchmarks that require a dedicated instruction prefetcher can tolerate increased BTB misses. Going further, we propose L2 instruction bypassing based on the insight that decreased L2 data misses deliver more benefit then the slight instruction latency reduction of having instructions in the L2. We show that L2 instruction bypass delivers more performance than a dedicated instruction prefetcher and instruction focused replacement policies. 

Place, publisher, year, edition, pages
Uppsala: Acta Universitatis Upsaliensis, 2024. p. 54
Series
Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology, ISSN 1651-6214 ; 2387
Keywords
Computer Architecture, Memory Systems, Server Design, Caches
National Category
Computer Systems
Research subject
Computer Science
Identifiers
urn:nbn:se:uu:diva-525875 (URN)978-91-513-2096-0 (ISBN)
Public defence
2024-05-31, 80127, Ångströmslaboratoriet, Lägerhyddsvägen 1, Uppsala, 13:00 (English)
Opponent
Supervisors
Available from: 2024-05-06 Created: 2024-04-02 Last updated: 2024-05-06

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full text

Authority records

Hassan, MuhammadPark, Chang HyunBlack-Schaffer, David

Search in DiVA

By author/editor
Hassan, MuhammadPark, Chang HyunBlack-Schaffer, David
By organisation
Division of Computer SystemsComputer Architecture and Computer CommunicationComputer Systems
Computer Systems

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 102 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf