uu.seUppsala University Publications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Addressing energy challenges in filter caches
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication. (UART)ORCID iD: 0000-0002-6259-7821
ARM Res, Lund, Sweden.
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication. (UART)
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication. (UART)
2017 (English)In: Proc. 29th International Symposium on Computer Architecture and High Performance Computing, IEEE Computer Society, 2017, p. 49-56Conference paper, Published paper (Refereed)
Abstract [en]

Filter caches and way-predictors are common approaches to improve the efficiency and/or performance of first-level caches. Filter caches use a small L0 to provide more efficient and faster access to a small subset of the data, and work well for programs with high locality. Way-predictors improve efficiency by accessing only the way predicted, which alleviates the need to read all ways in parallel without increasing latency, but hurts performance due to mispredictions.In this work we examine how SRAM layout constraints (h-trees and data mapping inside the cache) affect way-predictors and filter caches. We show that accessing the smaller L0 array can be significantly more energy efficient than attempting to read fewer ways from a larger L1 cache; and that the main source of energy inefficiency in filter caches comes from L0 and L1 misses. We propose a filter cache optimization that shares the tag array between the L0 and the L1, which incurs the overhead of reading the larger tag array on every access, but in return allows us to directly access the correct L1 way on each L0 miss. This optimization does not add any extra latency and counter-intuitively, improves the filter caches overall energy efficiency beyond that of the way-predictor.By combining the low power benefits of a physically smaller L0 with the reduction in miss energy by reading L1 tags upfront in parallel with L0 data, we show that the optimized filter cache reduces the dynamic cache energy compared to a traditional filter cache by 26% while providing the same performance advantage. Compared to a way-predictor, the optimized cache improves performance by 6% and energy by 2%.

Place, publisher, year, edition, pages
IEEE Computer Society, 2017. p. 49-56
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:uu:diva-334221DOI: 10.1109/SBAC-PAD.2017.14ISI: 000426895600007ISBN: 978-1-5090-1233-6 (electronic)OAI: oai:DiVA.org:uu-334221DiVA, id: diva2:1159077
Conference
29th International Symposium on Computer Architecture and High Performance Computing SBAC-PAD, 2017, October 17–20, Campinas, Brazil.
Available from: 2017-11-09 Created: 2017-11-21 Last updated: 2019-05-22Bibliographically approved
In thesis
1. Leveraging Existing Microarchitectural Structures to Improve First-Level Caching Efficiency
Open this publication in new window or tab >>Leveraging Existing Microarchitectural Structures to Improve First-Level Caching Efficiency
2019 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Low-latency data access is essential for performance. To achieve this, processors use fast first-level caches combined with out-of-order execution, to decrease and hide memory access latency respectively. While these approaches are effective for performance, they cost significant energy, leading to the development of many techniques that require designers to trade-off performance and efficiency.

Way-prediction and filter caches are two of the most common strategies for improving first-level cache energy efficiency while still minimizing latency. They both have compromises as way-prediction trades off some latency for better energy efficiency, while filter caches trade off some energy efficiency for lower latency. However, these strategies are not mutually exclusive. By borrowing elements from both, and taking into account SRAM memory layout limitations, we proposed a novel MRU-L0 cache that mitigates many of their shortcomings while preserving their benefits. Moreover, while first-level caches are tightly integrated into the cpu pipeline, existing work on these techniques largely ignores the impact they have on instruction scheduling. We show that the variable hit latency introduced by way-misspredictions causes instruction replays of load dependent instruction chains, which hurts performance and efficiency. We study this effect and propose a variable latency cache-hit instruction scheduler, that identifies potential misschedulings, reduces instruction replays, reduces negative performance impact, and further improves cache energy efficiency.

Modern pipelines also employ sophisticated execution strategies to hide memory latency and improve performance. While their primary use is for performance and correctness, they require intermediate storage that can be used as a cache as well. In this work we demonstrate how the store-buffer, paired with the memory dependency predictor, can be used to efficiently cache dirty data; and how the physical register file, paired with a value predictor, can be used to efficiently cache clean data. These strategies not only improve both performance and energy, but do so with no additional storage and minimal additional complexity, since they recycle existing cpu structures to detect reuse, memory ordering violations, and misspeculations.

Place, publisher, year, edition, pages
Uppsala: Acta Universitatis Upsaliensis, 2019. p. 42
Series
Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology, ISSN 1651-6214 ; 1821
Keywords
Energy Efficient Caching, Memory Architecture, Single Thread Performance, First-Level Caching, Out-of-Order Pipelines, Instruction Scheduling, Filter-Cache, Way-Prediction, Value-Prediction, Register-Sharing.
National Category
Computer Sciences
Identifiers
urn:nbn:se:uu:diva-383811 (URN)978-91-513-0681-0 (ISBN)
Public defence
2019-08-26, Sal VIII, Universitetshuset, Biskopsgatan 3, Uppsala, 09:00 (English)
Opponent
Supervisors
Available from: 2019-06-11 Created: 2019-05-22 Last updated: 2019-08-23

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full text

Authority records BETA

Alves, RicardoKaxiras, StefanosBlack-Schaffer, David

Search in DiVA

By author/editor
Alves, RicardoKaxiras, StefanosBlack-Schaffer, David
By organisation
Computer Architecture and Computer Communication
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 498 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf