uu.seUppsala University Publications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Hierarchical private/shared classification: The key to simple and efficient coherence for clustered cache hierarchies
Univ Murcia, Dept Comp Engn, E-30001 Murcia, Spain. (UART)
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication. (UART)
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication. (UART)
2015 (English)In: Proc. 21st International Symposium on High Performance Computer Architecture, IEEE Computer Society Digital Library, 2015, 186-197 p.Conference paper, Published paper (Refereed)
Abstract [en]

Hierarchical clustered cache designs are becoming an appealing alternative for multicores. Grouping cores and their caches in clusters reduces network congestion by localizing traffic among several hierarchical levels, potentially enabling much higher scalability. While such architectures can be formed recursively by replicating a base design pattern, keeping the whole hierarchy coherent requires more effort and consideration. The reason is that, in hierarchical coherence, even basic operations must be recursive. As a consequence, intermediate-level caches behave both as directories and as leaf caches. This leads to an explosion of states, protocol-races, and protocol complexity. While there have been previous efforts to extend directory-based coherence to hierarchical designs their increased complexity and verification cost is a serious impediment to their adoption. We aim to address these concerns by encapsulating all hierarchical complexity in a simple function: that of determining when a data block is shared entirely within a cluster (sub-tree of the hierarchy) and is private from the outside. This allows us to eliminate complex recursive operations that span the hierarchy and instead employ simple coherence mechanisms such as self-invalidation and write-through-now restricted to operate within the cluster where a data block is shared. We examine two inclusivity options and discuss the relation of our approach to the recently proposed Hierarchical-Race-Free (HRF) memory models. Finally, comparisons to a hierarchical directory-based MOESI, VIPS-M, and TokenCMP protocols show that, despite its simplicity our approach results in competitive performance and decreased network traffic.

Place, publisher, year, edition, pages
IEEE Computer Society Digital Library, 2015. 186-197 p.
National Category
Computer Systems
Identifiers
URN: urn:nbn:se:uu:diva-265651DOI: 10.1109/HPCA.2015.7056032ISI: 000380564900016ISBN: 9781479989300 (print)OAI: oai:DiVA.org:uu-265651DiVA: diva2:866343
Conference
HPCA 2015, February 7–11, Burlingame, CA
Available from: 2015-02-11 Created: 2015-11-02 Last updated: 2017-04-22Bibliographically approved
In thesis
1. Advances Towards Data-Race-Free Cache Coherence Through Data Classification
Open this publication in new window or tab >>Advances Towards Data-Race-Free Cache Coherence Through Data Classification
2017 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Providing a consistent view of the shared memory based on precise and well-defined semantics—memory consistency model—has been an enabling factor in the widespread acceptance and commercial success of shared-memory architectures. Moreover, cache coherence protocols have been employed by the hardware to remove from the programmers the burden of dealing with the memory inconsistency that emerges in the presence of the private caches. The principle behind all such cache coherence protocols is to guarantee that consistent values are read from the private caches at all times.

In its most stringent form, a cache coherence protocol eagerly enforces two invariants before each data modification: i) no other core has a copy of the data in its private caches, and ii) all other cores know where to receive the consistent data should they need the data later. Nevertheless, by partly transferring the responsibility for maintaining those invariants to the programmers, commercial multicores have adopted weaker memory consistency models, namely the Total Store Order (TSO), in order to optimize the performance for more common cases.

Moreover, memory models with more relaxed invariants have been proposed based on the observation that more and more software is written in compliance with the Data-Race-Free (DRF) semantics. The semantics of DRF software can be leveraged by the hardware to infer when data in the private caches might be inconsistent. As a result, hardware ignores the inconsistent data and retrieves the consistent data from the shared memory. DRF semantics therefore removes from the hardware the burden of eagerly enforcing the strong consistency invariants before each data modification. Instead, consistency is guaranteed only when needed. This results in manifold optimizations, such as reducing the energy consumption and improving the performance and scalability. The efficiency of detecting and discarding the inconsistent data is an important factor affecting the efficiency of such coherence protocols. For instance, discarding the consistent data does not affect the correctness, but results in performance loss and increased energy consumption.

In this thesis we show how data classification can be leveraged as an effective tool to simplify the cache coherence based on the DRF semantics. In particular, we introduce simple but efficient hardware-based private/shared data classification techniques that can be used to efficiently detect the inconsistent data, thus enabling low-overhead and scalable cache coherence solutions based on the DRF semantics.

Place, publisher, year, edition, pages
Uppsala: Acta Universitatis Upsaliensis, 2017. 64 p.
Series
Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology, ISSN 1651-6214 ; 1521
Keyword
Shared Memory Architectures, Multicore, Memory Hierarchy, Cache Coherence, Data Classification
National Category
Computer Systems
Identifiers
urn:nbn:se:uu:diva-320595 (URN)978-91-554-9925-9 (ISBN)
Public defence
2017-06-08, 2446, Lägerhyddsvägen 2, Hus 2, Uppsala, 13:15 (English)
Opponent
Supervisors
Available from: 2017-05-15 Created: 2017-04-22 Last updated: 2017-05-17

Open Access in DiVA

No full text

Other links

Publisher's full text

Authority records BETA

Ros, AlbertoDavari, MahdadKaxiras, Stefanos

Search in DiVA

By author/editor
Ros, AlbertoDavari, MahdadKaxiras, Stefanos
By organisation
Computer Architecture and Computer Communication
Computer Systems

Search outside of DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 416 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf