uu.seUppsala University Publications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Efficient temporal and spatial load to load forwarding
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.ORCID iD: 0000-0002-6259-7821
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
2020 (English)In: Proc. 26th International Symposium on High-Performance and Computer Architecture, IEEE Computer Society, 2020Conference paper, Published paper (Refereed)
Place, publisher, year, edition, pages
IEEE Computer Society, 2020.
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:uu:diva-383477OAI: oai:DiVA.org:uu-383477DiVA, id: diva2:1316131
Conference
HPCA 2020, February 22–26, San Diego, CA
Note

to appear

Available from: 2019-08-21 Created: 2019-05-16 Last updated: 2019-08-21Bibliographically approved
In thesis
1. Leveraging Existing Microarchitectural Structures to Improve First-Level Caching Efficiency
Open this publication in new window or tab >>Leveraging Existing Microarchitectural Structures to Improve First-Level Caching Efficiency
2019 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Low-latency data access is essential for performance. To achieve this, processors use fast first-level caches combined with out-of-order execution, to decrease and hide memory access latency respectively. While these approaches are effective for performance, they cost significant energy, leading to the development of many techniques that require designers to trade-off performance and efficiency.

Way-prediction and filter caches are two of the most common strategies for improving first-level cache energy efficiency while still minimizing latency. They both have compromises as way-prediction trades off some latency for better energy efficiency, while filter caches trade off some energy efficiency for lower latency. However, these strategies are not mutually exclusive. By borrowing elements from both, and taking into account SRAM memory layout limitations, we proposed a novel MRU-L0 cache that mitigates many of their shortcomings while preserving their benefits. Moreover, while first-level caches are tightly integrated into the cpu pipeline, existing work on these techniques largely ignores the impact they have on instruction scheduling. We show that the variable hit latency introduced by way-misspredictions causes instruction replays of load dependent instruction chains, which hurts performance and efficiency. We study this effect and propose a variable latency cache-hit instruction scheduler, that identifies potential misschedulings, reduces instruction replays, reduces negative performance impact, and further improves cache energy efficiency.

Modern pipelines also employ sophisticated execution strategies to hide memory latency and improve performance. While their primary use is for performance and correctness, they require intermediate storage that can be used as a cache as well. In this work we demonstrate how the store-buffer, paired with the memory dependency predictor, can be used to efficiently cache dirty data; and how the physical register file, paired with a value predictor, can be used to efficiently cache clean data. These strategies not only improve both performance and energy, but do so with no additional storage and minimal additional complexity, since they recycle existing cpu structures to detect reuse, memory ordering violations, and misspeculations.

Place, publisher, year, edition, pages
Uppsala: Acta Universitatis Upsaliensis, 2019. p. 42
Series
Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology, ISSN 1651-6214 ; 1821
Keywords
Energy Efficient Caching, Memory Architecture, Single Thread Performance, First-Level Caching, Out-of-Order Pipelines, Instruction Scheduling, Filter-Cache, Way-Prediction, Value-Prediction, Register-Sharing.
National Category
Computer Sciences
Identifiers
urn:nbn:se:uu:diva-383811 (URN)978-91-513-0681-0 (ISBN)
Public defence
2019-08-26, Sal VIII, Universitetshuset, Biskopsgatan 3, Uppsala, 09:00 (English)
Opponent
Supervisors
Available from: 2019-06-11 Created: 2019-05-22 Last updated: 2019-08-23

Open Access in DiVA

No full text in DiVA

Authority records BETA

Alves, RicardoKaxiras, StefanosBlack-Schaffer, David

Search in DiVA

By author/editor
Alves, RicardoKaxiras, StefanosBlack-Schaffer, David
By organisation
Computer Architecture and Computer Communication
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 259 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf