Logo: to the web site of Uppsala University

uu.sePublications from Uppsala University
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Multiversioned decoupled access-execute: The key to energy-efficient compilation of general-purpose programs
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication. (UART)
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology.ORCID iD: 0000-0002-8722-751X
Switzerland Univ Svizzera Italiana, Lugano, Switzerland.
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication. (UART)
Show others and affiliations
2016 (English)In: Proc. 25th International Conference on Compiler Construction, New York: ACM Press, 2016, p. 121-131Conference paper, Published paper (Refereed)
Resource type
Text
Abstract [en]

Computer architecture design faces an era of great challenges in an attempt to simultaneously improve performance and energy efficiency. Previous hardware techniques for energy management become severely limited, and thus, compilers play an essential role in matching the software to the more restricted hardware capabilities. One promising approach is software decoupled access-execute (DAE), in which the compiler transforms the code into coarse-grain phases that are well-matched to the Dynamic Voltage and Frequency Scaling (DVFS) capabilities of the hardware. While this method is proved efficient for statically analyzable codes, general purpose applications pose significant challenges due to pointer aliasing, complex control flow and unknown runtime events. We propose a universal compile-time method to decouple general-purpose applications, using simple but efficient heuristics. Our solutions overcome the challenges of complex code and show that automatic decoupled execution significantly reduces the energy expenditure of irregular or memory-bound applications and even yields slight performance boosts. Overall, our technique achieves over 20% on average energy-delay-product (EDP) improvements (energy over 15% and performance over 5%) across 14 bench-marks from SPEC CPU 2006 and Parboil benchmark suites, with peak EDP improvements surpassing 70%.

Place, publisher, year, edition, pages
New York: ACM Press, 2016. p. 121-131
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:uu:diva-283200DOI: 10.1145/2892208.2892209ISI: 000389808800012ISBN: 9781450342414 (print)OAI: oai:DiVA.org:uu-283200DiVA, id: diva2:918766
Conference
CC 2016, March 17–18, Barcelona, Spain
Projects
UPMARCAvailable from: 2016-03-17 Created: 2016-04-11 Last updated: 2021-06-24Bibliographically approved
In thesis
1. Efficient Execution Paradigms for Parallel Heterogeneous Architectures
Open this publication in new window or tab >>Efficient Execution Paradigms for Parallel Heterogeneous Architectures
2016 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

This thesis proposes novel, efficient execution-paradigms for parallel heterogeneous architectures. The end of Dennard scaling is threatening the effectiveness of DVFS in future nodes; therefore, new execution paradigms are required to exploit the non-linear relationship between performance and energy efficiency of memory-bound application-regions. To attack this problem, we propose the decoupled access-execute (DAE) paradigm. DAE transforms regions of interest (at program-level) in two coarse-grain phases: the access-phase and the execute-phase, which we can independently DVFS. The access-phase is intended to prefetch the data in the cache, and is therefore expected to be predominantly memory-bound, while the execute-phase runs immediately after the access-phase (that has warmed-up the cache) and is therefore expected to be compute-bound.

DAE, achieves good energy savings (on average 25% lower EDP) without performance degradation, as opposed to other DVFS techniques. Furthermore, DAE increases the memory level parallelism (MLP) of memory-bound regions, which results in performance improvements of memory-bound applications. To automatically transform application-regions to DAE, we propose compiler techniques to automatically generate and incorporate the access-phase(s) in the application. Our work targets affine, non-affine, and even complex, general-purpose codes. Furthermore, we explore the benefits of software multi-versioning to optimize DAE in dynamic environments, and handle codes with statically unknown access-phase overheads. In general, applications automatically-transformed to DAE by our compiler, maintain (or even exceed in some cases) the good performance and energy efficiency of manually-optimized DAE codes.

Finally, to ease the programming environment of heterogeneous systems (with integrated GPUs), we propose a novel system-architecture that provides unified virtual memory with low overhead. The underlying insight behind our work is that existing data-parallel programming models are a good fit for relaxed memory consistency models (e.g., the heterogeneous race-free model). This allows us to simplify the coherency protocol between the CPU – GPU, as well as the GPU memory management unit. On average, we achieve 45% speedup and 45% lower EDP over the corresponding SC implementation.

Place, publisher, year, edition, pages
Uppsala: Acta Universitatis Upsaliensis, 2016. p. 54
Series
Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology, ISSN 1651-6214 ; 1405
Keywords
Decoupled Execution, Performance, Energy, DVFS, Compiler Optimizations, Heterogeneous Coherence
National Category
Computer Systems
Research subject
Computer Science
Identifiers
urn:nbn:se:uu:diva-300831 (URN)978-91-554-9654-8 (ISBN)
Public defence
2016-09-30, ITC/1111, Lägerhyddsvägen 2, Uppsala, 13:00 (English)
Opponent
Supervisors
Projects
UPMARC
Funder
EU, FP7, Seventh Framework Programme, FP7-ICT-288653Swedish Research Council
Available from: 2016-09-07 Created: 2016-08-15 Last updated: 2019-02-25

Open Access in DiVA

fulltext(1571 kB)1068 downloads
File information
File name FULLTEXT02.pdfFile size 1571 kBChecksum SHA-512
7b8bbe9fde66b6caaf51b9e36720ed42661d57fcff53c2ed86502b648cb9ac427974df92ca1f0720c1ced90a21d1a942c46c0ea44946fa368d86130a8549ebf2
Type fulltextMimetype application/pdf

Other links

Publisher's full text

Authority records

Koukos, KonstantinosEkemark, PerSpiliopoulos, VasileiosKaxiras, StefanosJimborean, Alexandra

Search in DiVA

By author/editor
Koukos, KonstantinosEkemark, PerSpiliopoulos, VasileiosKaxiras, StefanosJimborean, Alexandra
By organisation
Computer Architecture and Computer CommunicationDepartment of Information Technology
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 1070 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 1141 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf