Logo: to the web site of Uppsala University

uu.sePublications from Uppsala University
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Static instruction scheduling for high performance on energy-efficient processors
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Computer Systems. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
2018 (English)Licentiate thesis, comprehensive summary (Other academic)
Abstract [en]

New trends such as the internet-of-things and smart homes push the demands for energy-efficiency. Choosing energy-efficient hardware, however, often comes as a trade-off to high-performance. In order to strike a good balance between the two, we propose software solutions to tackle the performance bottlenecks of small and energy-efficient processors.

One of the main performance bottlenecks of processors is the discrepancy between processor and memory speed, known as the memory wall. While the processor executes instructions at a high pace, the memory is too slow to provide data in a timely manner, if data has not been cached in advance. Load instructions that require an access to memory are thereby referred to as long-latency or delinquent loads. Long latencies caused by delinquent loads are putting a strain on small processors, which have few or no resources to effectively hide the latencies. As a result, the processor may stall.

In this thesis we propose compile-time transformation techniques to mitigate the penalties of delinquent loads on small out-of-order processors, with the ultimate goal to avoid processor stalls as much as possible. Our code transformation is applicable for general-purpose code, including unknown memory dependencies, complex control flow and pointers. We further propose a software-hardware co-design that combines the code transformation technique with lightweight hardware support to hide latencies on a stall-on-use in-order processor.

Place, publisher, year, edition, pages
Uppsala University, 2018.
Series
Information technology licentiate theses: Licentiate theses from the Department of Information Technology, ISSN 1404-5117 ; 2018-001
National Category
Computer Engineering
Research subject
Computer Science
Identifiers
URN: urn:nbn:se:uu:diva-349420OAI: oai:DiVA.org:uu-349420DiVA, id: diva2:1201906
Supervisors
Projects
UPMARCAvailable from: 2017-12-18 Created: 2018-04-26 Last updated: 2019-02-25Bibliographically approved
List of papers
1. Clairvoyance: Look-ahead compile-time scheduling
Open this publication in new window or tab >>Clairvoyance: Look-ahead compile-time scheduling
Show others...
2017 (English)In: Proc. 15th International Symposium on Code Generation and Optimization, Piscataway, NJ: IEEE Press, 2017, p. 171-184Conference paper, Published paper (Refereed)
Place, publisher, year, edition, pages
Piscataway, NJ: IEEE Press, 2017
National Category
Computer Sciences
Identifiers
urn:nbn:se:uu:diva-316480 (URN)000402548700015 ()978-1-5090-4931-8 (ISBN)
Conference
CGO 2017, February 4–8, Austin, TX
Projects
UPMARC
Funder
Swedish Research Council, 2010-4741
Available from: 2017-02-04 Created: 2017-03-01 Last updated: 2020-01-17Bibliographically approved
2. Static instruction scheduling for high performance on limited hardware
Open this publication in new window or tab >>Static instruction scheduling for high performance on limited hardware
Show others...
2018 (English)In: IEEE Transactions on Computers, ISSN 0018-9340, E-ISSN 1557-9956, Vol. 67, no 4, p. 513-527Article in journal (Refereed) Published
Abstract [en]

Complex out-of-order (OoO) processors have been designed to overcome the restrictions of outstanding long-latency misses at the cost of increased energy consumption. Simple, limited OoO processors are a compromise in terms of energy consumption and performance, as they have fewer hardware resources to tolerate the penalties of long-latency loads. In worst case, these loads may stall the processor entirely. We present Clairvoyance, a compiler based technique that generates code able to hide memory latency and better utilize simple OoO processors. By clustering loads found across basic block boundaries, Clairvoyance overlaps the outstanding latencies to increases memory-level parallelism. We show that these simple OoO processors, equipped with the appropriate compiler support, can effectively hide long-latency loads and achieve performance improvements for memory-bound applications. To this end, Clairvoyance tackles (i) statically unknown dependencies, (ii) insufficient independent instructions, and (iii) register pressure. Clairvoyance achieves a geomean execution time improvement of 14 percent for memory-bound applications, on top of standard O3 optimizations, while maintaining compute-bound applications' high-performance.

National Category
Computer Sciences
Identifiers
urn:nbn:se:uu:diva-334011 (URN)10.1109/TC.2017.2769641 (DOI)000427420800005 ()
Projects
UPMARC
Funder
Swedish Research Council, 2016-05086
Available from: 2017-11-03 Created: 2017-11-20 Last updated: 2023-03-28Bibliographically approved
3. Software Out-of-Order Execution for In-Order Architectures
Open this publication in new window or tab >>Software Out-of-Order Execution for In-Order Architectures
2016 (English)In: Proc. 25th International Conference on Parallel Architectures and Compilation Techniques, New York: ACM Press, 2016, p. 458-458Conference paper, Poster (with or without abstract) (Refereed)
Place, publisher, year, edition, pages
New York: ACM Press, 2016
National Category
Computer Sciences
Identifiers
urn:nbn:se:uu:diva-309768 (URN)10.1145/2967938.2971466 (DOI)000392249100055 ()978-1-4503-4121-9 (ISBN)
Conference
PACT 2016, September 11–15, Haifa, Israel
Projects
UPMARC
Available from: 2016-09-11 Created: 2016-12-07 Last updated: 2018-04-26Bibliographically approved
4. Transcending hardware limits with software out-of-order processing
Open this publication in new window or tab >>Transcending hardware limits with software out-of-order processing
Show others...
2017 (English)In: IEEE Computer Architecture Letters, ISSN 1556-6056, Vol. 16, no 2, p. 162-165Article in journal (Refereed) Published
National Category
Computer Systems
Identifiers
urn:nbn:se:uu:diva-334012 (URN)10.1109/LCA.2017.2672559 (DOI)000418870500018 ()
Projects
UPMARC
Available from: 2017-02-22 Created: 2017-11-20 Last updated: 2018-04-26Bibliographically approved

Open Access in DiVA

fulltext(3270 kB)759 downloads
File information
File name FULLTEXT01.pdfFile size 3270 kBChecksum SHA-512
9a3c66119da8240ffe7be556dedebc4e7439f5c55966bd7cd7a46902ccca73bba66bc517f649f0dc20a332c17f3ff38071d3a79de2dbf717ceb6adb3e17588be
Type fulltextMimetype application/pdf

Authority records

Tran, Kim-Anh

Search in DiVA

By author/editor
Tran, Kim-Anh
By organisation
Division of Computer SystemsComputer Architecture and Computer Communication
Computer Engineering

Search outside of DiVA

GoogleGoogle Scholar
Total: 759 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 288 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf