uu.seUppsala universitets publikationer
Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Static instruction scheduling for high performance on limited hardware
Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Matematisk-datavetenskapliga sektionen, Institutionen för informationsteknologi, Datorarkitektur och datorkommunikation.
Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Matematisk-datavetenskapliga sektionen, Institutionen för informationsteknologi, Datorarkitektur och datorkommunikation. Norwegian Univ Sci & Technol, S-75236 Uppsala, Sweden.ORCID-id: 0000-0003-4232-6976
Visa övriga samt affilieringar
2018 (Engelska)Ingår i: IEEE Transactions on Computers, ISSN 0018-9340, Vol. 67, nr 4, s. 513-527Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

Complex out-of-order (OoO) processors have been designed to overcome the restrictions of outstanding long-latency misses at the cost of increased energy consumption. Simple, limited OoO processors are a compromise in terms of energy consumption and performance, as they have fewer hardware resources to tolerate the penalties of long-latency loads. In worst case, these loads may stall the processor entirely. We present Clairvoyance, a compiler based technique that generates code able to hide memory latency and better utilize simple OoO processors. By clustering loads found across basic block boundaries, Clairvoyance overlaps the outstanding latencies to increases memory-level parallelism. We show that these simple OoO processors, equipped with the appropriate compiler support, can effectively hide long-latency loads and achieve performance improvements for memory-bound applications. To this end, Clairvoyance tackles (i) statically unknown dependencies, (ii) insufficient independent instructions, and (iii) register pressure. Clairvoyance achieves a geomean execution time improvement of 14 percent for memory-bound applications, on top of standard O3 optimizations, while maintaining compute-bound applications' high-performance.

Ort, förlag, år, upplaga, sidor
2018. Vol. 67, nr 4, s. 513-527
Nationell ämneskategori
Datavetenskap (datalogi)
Identifikatorer
URN: urn:nbn:se:uu:diva-334011DOI: 10.1109/TC.2017.2769641ISI: 000427420800005OAI: oai:DiVA.org:uu-334011DiVA, id: diva2:1158485
Projekt
UPMARC
Forskningsfinansiär
Vetenskapsrådet, 2016-05086Tillgänglig från: 2017-11-03 Skapad: 2017-11-20 Senast uppdaterad: 2020-01-17Bibliografiskt granskad
Ingår i avhandling
1. Static instruction scheduling for high performance on energy-efficient processors
Öppna denna publikation i ny flik eller fönster >>Static instruction scheduling for high performance on energy-efficient processors
2018 (Engelska)Licentiatavhandling, sammanläggning (Övrigt vetenskapligt)
Abstract [en]

New trends such as the internet-of-things and smart homes push the demands for energy-efficiency. Choosing energy-efficient hardware, however, often comes as a trade-off to high-performance. In order to strike a good balance between the two, we propose software solutions to tackle the performance bottlenecks of small and energy-efficient processors.

One of the main performance bottlenecks of processors is the discrepancy between processor and memory speed, known as the memory wall. While the processor executes instructions at a high pace, the memory is too slow to provide data in a timely manner, if data has not been cached in advance. Load instructions that require an access to memory are thereby referred to as long-latency or delinquent loads. Long latencies caused by delinquent loads are putting a strain on small processors, which have few or no resources to effectively hide the latencies. As a result, the processor may stall.

In this thesis we propose compile-time transformation techniques to mitigate the penalties of delinquent loads on small out-of-order processors, with the ultimate goal to avoid processor stalls as much as possible. Our code transformation is applicable for general-purpose code, including unknown memory dependencies, complex control flow and pointers. We further propose a software-hardware co-design that combines the code transformation technique with lightweight hardware support to hide latencies on a stall-on-use in-order processor.

Ort, förlag, år, upplaga, sidor
Uppsala University, 2018
Serie
IT licentiate theses / Uppsala University, Department of Information Technology, ISSN 1404-5117 ; 2018-001
Nationell ämneskategori
Datorteknik
Forskningsämne
Datavetenskap
Identifikatorer
urn:nbn:se:uu:diva-349420 (URN)
Handledare
Projekt
UPMARC
Tillgänglig från: 2017-12-18 Skapad: 2018-04-26 Senast uppdaterad: 2019-02-25Bibliografiskt granskad
2. Finding and Exploiting Memory-Level-Parallelism in Constrained Speculative Architectures
Öppna denna publikation i ny flik eller fönster >>Finding and Exploiting Memory-Level-Parallelism in Constrained Speculative Architectures
2020 (Engelska)Doktorsavhandling, sammanläggning (Övrigt vetenskapligt)
Abstract [en]

One of the main performance bottlenecks of processors today is the discrepancy between processor and memory speed, known as the memory wall. While the processor executes instructions at a high pace, the memory is too slow to provide data in a timely manner. Load instructions that require an access to memory are referred to as long-latency or delinquent loads. To prevent the processor from stalling, independent instruction past the load may execute, including independent loads. Overlapping load operations and thus their latency is referred to as memory-level parallelism. Memory-level parallelism (MLP) can significantly improve performance. Today's out-of-order processors are therefore equipped with complex hardware that allows them to look into the future and to select independent loads that can be overlapped. However, the ability to choose future instructions and speculatively execute them in advance introduces complexity, increased power consumption and potential security risks. In this thesis we look at constrained speculative architectures that struggle to hide memory latencies as they are constrained by design, by their resources, or by security. We investigate ways for the compiler to help them in finding MLP, with the ultimate goal to avoid processor stalls as much as possible. This includes small energy-efficient processors that lack the ability to look-ahead far enough to find independent loads, but also large processors that are disallowed to speculatively execute independent loads due to enforced security measures to circumvent side-channel attacks. We identify the reason for their limitation and propose software transformations and hardware extensions to overcome their restrictions.

Ort, förlag, år, upplaga, sidor
Uppsala: Acta Universitatis Upsaliensis, 2020. s. 50
Serie
Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology, ISSN 1651-6214 ; 1897
Nyckelord
Memory-level-parallelism, Energy-efficiency, Performance, Compiler, Instruction Scheduling, SW/HW Co-Design
Nationell ämneskategori
Datavetenskap (datalogi)
Forskningsämne
Datavetenskap
Identifikatorer
urn:nbn:se:uu:diva-402642 (URN)978-91-513-0860-9 (ISBN)
Opponent
Handledare
Tillgänglig från: 2020-02-19 Skapad: 2020-01-17 Senast uppdaterad: 2020-03-13

Open Access i DiVA

Fulltext saknas i DiVA

Övriga länkar

Förlagets fulltext

Personposter BETA

Tran, Kim-AnhSjälander, MagnusSpiliopoulos, VasileiosKaxiras, StefanosJimborean, Alexandra

Sök vidare i DiVA

Av författaren/redaktören
Tran, Kim-AnhSjälander, MagnusSpiliopoulos, VasileiosKaxiras, StefanosJimborean, Alexandra
Av organisationen
Datorarkitektur och datorkommunikation
Datavetenskap (datalogi)

Sök vidare utanför DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetricpoäng

doi
urn-nbn
Totalt: 171 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf