Logo: to the web site of Uppsala University

uu.sePublications from Uppsala University
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Building Heterogeneous Unified Virtual Memories (UVMs) without the Overhead
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication. (UART)ORCID iD: 0000-0002-9460-1290
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication. (UART)
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication. (UART)
2016 (English)In: ACM Transactions on Architecture and Code Optimization (TACO), ISSN 1544-3566, E-ISSN 1544-3973, Vol. 13, no 1, article id 1Article in journal (Refereed) Published
Abstract [en]

This work proposes a novel scheme to facilitate heterogeneous systems with unified virtual memory. Research proposals implement coherence protocols for sequential consistency (SC) between central processing unit (CPU) cores and between devices. Such mechanisms introduce severe bottlenecks in the system; therefore, we adopt the heterogeneous-race-free (HRF) memory model. The use of HRF simplifies the coherency protocol and the graphics processing unit (GPU) memory management unit (MMU). Our protocol optimizes CPU and GPU demands separately, with the GPU part being simpler while the CPU is more elaborate and latency aware. We achieve an average 45% speedup and 45% energy-delay product reduction (20% energy) over the corresponding SC implementation.

Place, publisher, year, edition, pages
2016. Vol. 13, no 1, article id 1
Keywords [en]
Multicore; heterogeneous coherence; GPU MMU design; virtual coherence protocol; directory-less protocol
National Category
Computer Systems
Identifiers
URN: urn:nbn:se:uu:diva-295765DOI: 10.1145/2889488ISI: 000373904600001OAI: oai:DiVA.org:uu-295765DiVA, id: diva2:934693
Projects
UPMARC
Funder
EU, FP7, Seventh Framework Programme, FP7-ICT-288653EU, European Research Council, TIN2012-38341-C04-03Available from: 2016-04-05 Created: 2016-06-09 Last updated: 2017-11-30Bibliographically approved
In thesis
1. Efficient Execution Paradigms for Parallel Heterogeneous Architectures
Open this publication in new window or tab >>Efficient Execution Paradigms for Parallel Heterogeneous Architectures
2016 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

This thesis proposes novel, efficient execution-paradigms for parallel heterogeneous architectures. The end of Dennard scaling is threatening the effectiveness of DVFS in future nodes; therefore, new execution paradigms are required to exploit the non-linear relationship between performance and energy efficiency of memory-bound application-regions. To attack this problem, we propose the decoupled access-execute (DAE) paradigm. DAE transforms regions of interest (at program-level) in two coarse-grain phases: the access-phase and the execute-phase, which we can independently DVFS. The access-phase is intended to prefetch the data in the cache, and is therefore expected to be predominantly memory-bound, while the execute-phase runs immediately after the access-phase (that has warmed-up the cache) and is therefore expected to be compute-bound.

DAE, achieves good energy savings (on average 25% lower EDP) without performance degradation, as opposed to other DVFS techniques. Furthermore, DAE increases the memory level parallelism (MLP) of memory-bound regions, which results in performance improvements of memory-bound applications. To automatically transform application-regions to DAE, we propose compiler techniques to automatically generate and incorporate the access-phase(s) in the application. Our work targets affine, non-affine, and even complex, general-purpose codes. Furthermore, we explore the benefits of software multi-versioning to optimize DAE in dynamic environments, and handle codes with statically unknown access-phase overheads. In general, applications automatically-transformed to DAE by our compiler, maintain (or even exceed in some cases) the good performance and energy efficiency of manually-optimized DAE codes.

Finally, to ease the programming environment of heterogeneous systems (with integrated GPUs), we propose a novel system-architecture that provides unified virtual memory with low overhead. The underlying insight behind our work is that existing data-parallel programming models are a good fit for relaxed memory consistency models (e.g., the heterogeneous race-free model). This allows us to simplify the coherency protocol between the CPU – GPU, as well as the GPU memory management unit. On average, we achieve 45% speedup and 45% lower EDP over the corresponding SC implementation.

Place, publisher, year, edition, pages
Uppsala: Acta Universitatis Upsaliensis, 2016. p. 54
Series
Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology, ISSN 1651-6214 ; 1405
Keywords
Decoupled Execution, Performance, Energy, DVFS, Compiler Optimizations, Heterogeneous Coherence
National Category
Computer Systems
Research subject
Computer Science
Identifiers
urn:nbn:se:uu:diva-300831 (URN)978-91-554-9654-8 (ISBN)
Public defence
2016-09-30, ITC/1111, Lägerhyddsvägen 2, Uppsala, 13:00 (English)
Opponent
Supervisors
Projects
UPMARC
Funder
EU, FP7, Seventh Framework Programme, FP7-ICT-288653Swedish Research Council
Available from: 2016-09-07 Created: 2016-08-15 Last updated: 2019-02-25

Open Access in DiVA

fulltext(594 kB)838 downloads
File information
File name FULLTEXT01.pdfFile size 594 kBChecksum SHA-512
03dba4cc79d0348aaa7d338e90f0b8a3aa734e0969bbb97ad82e7bc07097e07610614f2f9d38a1a73790a602b9b482806789a857aa72a1a7d3406aa2834d559a
Type fulltextMimetype application/pdf

Other links

Publisher's full text

Authority records

Koukos, KonstantinosHagersten, ErikKaxiras, Stefanos

Search in DiVA

By author/editor
Koukos, KonstantinosHagersten, ErikKaxiras, Stefanos
By organisation
Computer Architecture and Computer Communication
In the same journal
ACM Transactions on Architecture and Code Optimization (TACO)
Computer Systems

Search outside of DiVA

GoogleGoogle Scholar
Total: 838 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 645 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf