uu.seUppsala University Publications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
The Superfluous Load Queue
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
2018 (English)In: 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), IEEE, 2018, p. 95-107Conference paper, Published paper (Refereed)
Abstract [en]

In an out-of-order core, the load queue (LQ), the store queue (SQ), and the store buffer (SB) are responsible for ensuring: i) correct forwarding of stores to loads and ii) correct ordering among loads (with respect to external stores). The first requirement safeguards the sequential semantics of program execution and applies to both serial and parallel code; the second requirement safeguards the semantics of coherence and consistency (e.g., TSO). In particular, loads search the SQ/SB for the latest value that may have been produced by a store, and stores and invalidations search the LQ to find speculative loads in case they violate uniprocessor or multiprocessor ordering. To meet timing constraints the LQ and SQ/SB system is composed of CAM structures that are frequently searched. This results in high complexity, cost, and significant difficulty to scale, but is the current state of the art. Prior research demonstrated the feasibility of a non-associative LQ by replaying loads at commit. There is a steep cost however: a significant increase in L1 accesses and contention for L1 ports. This is because prior work assumes Sequential Consistency and completely ignores the existence of a SB in the system. In contrast, we intentionally delay stores in the SB to achieve a total management of stores and loads in a core, while still supporting TSO. Our main result is that we eliminate the LQ without burdening the L1 with extra accesses. Store forwarding is achieved by delaying our own stores until speculatively issued loads are validated on commit, entirely in-core; TSO load -> load ordering is preserved by delaying remote external stores in their SB until our own speculative reordered loads commit. While the latter is inspired by recent work on non-speculative load reordering, our contribution here is to show that this can be accomplished without having a load queue. Eliminating the LQ results in both energy savings and performance improvement from the elimination of LQ-induced stalls.

Place, publisher, year, edition, pages
IEEE, 2018. p. 95-107
National Category
Computer Systems Computer Sciences
Identifiers
URN: urn:nbn:se:uu:diva-361411DOI: 10.1109/MICRO.2018.00017ISI: 000455869300008ISBN: 978-1-5386-6240-3 (electronic)OAI: oai:DiVA.org:uu-361411DiVA, id: diva2:1250476
Conference
51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), Fukuoka City, Japan, October 20–24, 2018
Funder
Swedish Research Council, 621-2012-5332EU, European Research Council, TIN2015-66972-C5-3-RAvailable from: 2018-09-24 Created: 2018-09-24 Last updated: 2019-02-01Bibliographically approved

Open Access in DiVA

fulltext(633 kB)159 downloads
File information
File name FULLTEXT01.pdfFile size 633 kBChecksum SHA-512
ee1d47a30e5152f33219687f9427d5cd21c826e6151c6c8a008239ade4de862cecea5cd436ef608f6edfd076463a1744989f2c985ce06e49aa99d38a8b29a5ab
Type fulltextMimetype application/pdf

Other links

Publisher's full text

Authority records BETA

Ros, AlbertoKaxiras, Stefanos

Search in DiVA

By author/editor
Ros, AlbertoKaxiras, Stefanos
By organisation
Computer SystemsComputer Architecture and Computer Communication
Computer SystemsComputer Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 159 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 282 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf