Logo: to the web site of Uppsala University

uu.sePublications from Uppsala University
Change search
Refine search result
1 - 10 of 10
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1.
    Carlson, Trevor E.
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Tran, Kim-Anh
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Jimborean, Alexandra
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Koukos, Konstantinos
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Själander, Magnus
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Kaxiras, Stefanos
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Transcending hardware limits with software out-of-order processing2017In: IEEE Computer Architecture Letters, ISSN 1556-6056, Vol. 16, no 2, p. 162-165Article in journal (Refereed)
  • 2.
    Jimborean, Alexandra
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computing Science.
    Koukos, Konstantinos
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Spiliopoulos, Vasileios
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Black-Schaffer, David
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Kaxiras, Stefanos
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Fix the code. Don't tweak the hardware: A new compiler approach to Voltage–Frequency scaling2014In: Proc. 12th International Symposium on Code Generation and Optimization, New York: ACM Press, 2014, p. 262-272Conference paper (Refereed)
    Download full text (pdf)
    fulltext
  • 3.
    Koukos, Konstantinos
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Computer Systems. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Efficient Execution Paradigms for Parallel Heterogeneous Architectures2016Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    This thesis proposes novel, efficient execution-paradigms for parallel heterogeneous architectures. The end of Dennard scaling is threatening the effectiveness of DVFS in future nodes; therefore, new execution paradigms are required to exploit the non-linear relationship between performance and energy efficiency of memory-bound application-regions. To attack this problem, we propose the decoupled access-execute (DAE) paradigm. DAE transforms regions of interest (at program-level) in two coarse-grain phases: the access-phase and the execute-phase, which we can independently DVFS. The access-phase is intended to prefetch the data in the cache, and is therefore expected to be predominantly memory-bound, while the execute-phase runs immediately after the access-phase (that has warmed-up the cache) and is therefore expected to be compute-bound.

    DAE, achieves good energy savings (on average 25% lower EDP) without performance degradation, as opposed to other DVFS techniques. Furthermore, DAE increases the memory level parallelism (MLP) of memory-bound regions, which results in performance improvements of memory-bound applications. To automatically transform application-regions to DAE, we propose compiler techniques to automatically generate and incorporate the access-phase(s) in the application. Our work targets affine, non-affine, and even complex, general-purpose codes. Furthermore, we explore the benefits of software multi-versioning to optimize DAE in dynamic environments, and handle codes with statically unknown access-phase overheads. In general, applications automatically-transformed to DAE by our compiler, maintain (or even exceed in some cases) the good performance and energy efficiency of manually-optimized DAE codes.

    Finally, to ease the programming environment of heterogeneous systems (with integrated GPUs), we propose a novel system-architecture that provides unified virtual memory with low overhead. The underlying insight behind our work is that existing data-parallel programming models are a good fit for relaxed memory consistency models (e.g., the heterogeneous race-free model). This allows us to simplify the coherency protocol between the CPU – GPU, as well as the GPU memory management unit. On average, we achieve 45% speedup and 45% lower EDP over the corresponding SC implementation.

    List of papers
    1. Towards more efficient execution: a decoupled access-execute approach
    Open this publication in new window or tab >>Towards more efficient execution: a decoupled access-execute approach
    2013 (English)In: Proc. 27th ACM International Conference on Supercomputing, New York: ACM Press, 2013, p. 253-262Conference paper, Published paper (Refereed)
    Abstract [en]

    The end of Dennard scaling is expected to shrink the range of DVFS in future nodes, limiting the energy savings of this technique. This paper evaluates how much we can increase the effectiveness of DVFS by using a software decoupled access-execute approach. Decoupling the data access from execution allows us to apply optimal voltage-frequency selection for each phase and therefore improve energy efficiency over standard coupled execution.

    The underlying insight of our work is that by decoupling access and execute we can take advantage of the memory-bound nature of the access phase and the compute-bound nature of the execute phase to optimize power efficiency, while maintaining good performance. To demonstrate this we built a task based parallel execution infrastructure consisting of: (1) a runtime system to orchestrate the execution, (2) power models to predict optimal voltage-frequency selection at runtime, (3) a modeling infrastructure based on hardware measurements to simulate zero-latency, per-core DVFS, and (4) a hardware measurement infrastructure to verify our model's accuracy.

    Based on real hardware measurements we project that the combination of decoupled access-execute and DVFS has the potential to improve EDP by 25% without hurting performance. On memory-bound applications we significantly improve performance due to increased MLP in the access phase and ILP in the execute phase. Furthermore we demonstrate that our method can achieve high performance both in presence or absence of a hardware prefetcher.

    Place, publisher, year, edition, pages
    New York: ACM Press, 2013
    Keywords
    Task-Based Execution, Decoupled Execution, Performance, Energy, DVFS
    National Category
    Computer Systems
    Research subject
    Computer Systems
    Identifiers
    urn:nbn:se:uu:diva-203239 (URN)10.1145/2464996.2465012 (DOI)978-1-4503-2130-3 (ISBN)
    Conference
    ICS 2013, June 10-14, Eugene, OR
    Projects
    LPGPU FP7-ICT-288653UPMARC
    Funder
    EU, FP7, Seventh Framework Programme, ICT-288653Swedish Research Council
    Available from: 2013-07-06 Created: 2013-07-05 Last updated: 2016-09-02Bibliographically approved
    2. Fix the code. Don't tweak the hardware: A new compiler approach to Voltage–Frequency scaling
    Open this publication in new window or tab >>Fix the code. Don't tweak the hardware: A new compiler approach to Voltage–Frequency scaling
    Show others...
    2014 (English)In: Proc. 12th International Symposium on Code Generation and Optimization, New York: ACM Press, 2014, p. 262-272Conference paper, Published paper (Refereed)
    Place, publisher, year, edition, pages
    New York: ACM Press, 2014
    National Category
    Computer Sciences
    Identifiers
    urn:nbn:se:uu:diva-212778 (URN)978-1-4503-2670-4 (ISBN)
    Conference
    CGO 2014, February 15-19, Orlando, FL
    Projects
    UPMARC
    Available from: 2014-02-19 Created: 2013-12-13 Last updated: 2018-01-11Bibliographically approved
    3. Multiversioned decoupled access-execute: The key to energy-efficient compilation of general-purpose programs
    Open this publication in new window or tab >>Multiversioned decoupled access-execute: The key to energy-efficient compilation of general-purpose programs
    Show others...
    2016 (English)In: Proc. 25th International Conference on Compiler Construction, New York: ACM Press, 2016, p. 121-131Conference paper, Published paper (Refereed)
    Abstract [en]

    Computer architecture design faces an era of great challenges in an attempt to simultaneously improve performance and energy efficiency. Previous hardware techniques for energy management become severely limited, and thus, compilers play an essential role in matching the software to the more restricted hardware capabilities. One promising approach is software decoupled access-execute (DAE), in which the compiler transforms the code into coarse-grain phases that are well-matched to the Dynamic Voltage and Frequency Scaling (DVFS) capabilities of the hardware. While this method is proved efficient for statically analyzable codes, general purpose applications pose significant challenges due to pointer aliasing, complex control flow and unknown runtime events. We propose a universal compile-time method to decouple general-purpose applications, using simple but efficient heuristics. Our solutions overcome the challenges of complex code and show that automatic decoupled execution significantly reduces the energy expenditure of irregular or memory-bound applications and even yields slight performance boosts. Overall, our technique achieves over 20% on average energy-delay-product (EDP) improvements (energy over 15% and performance over 5%) across 14 bench-marks from SPEC CPU 2006 and Parboil benchmark suites, with peak EDP improvements surpassing 70%.

    Place, publisher, year, edition, pages
    New York: ACM Press, 2016
    National Category
    Computer Sciences
    Identifiers
    urn:nbn:se:uu:diva-283200 (URN)10.1145/2892208.2892209 (DOI)000389808800012 ()9781450342414 (ISBN)
    Conference
    CC 2016, March 17–18, Barcelona, Spain
    Projects
    UPMARC
    Available from: 2016-03-17 Created: 2016-04-11 Last updated: 2021-06-24Bibliographically approved
    4. Building Heterogeneous Unified Virtual Memories (UVMs) without the Overhead
    Open this publication in new window or tab >>Building Heterogeneous Unified Virtual Memories (UVMs) without the Overhead
    2016 (English)In: ACM Transactions on Architecture and Code Optimization (TACO), ISSN 1544-3566, E-ISSN 1544-3973, Vol. 13, no 1, article id 1Article in journal (Refereed) Published
    Abstract [en]

    This work proposes a novel scheme to facilitate heterogeneous systems with unified virtual memory. Research proposals implement coherence protocols for sequential consistency (SC) between central processing unit (CPU) cores and between devices. Such mechanisms introduce severe bottlenecks in the system; therefore, we adopt the heterogeneous-race-free (HRF) memory model. The use of HRF simplifies the coherency protocol and the graphics processing unit (GPU) memory management unit (MMU). Our protocol optimizes CPU and GPU demands separately, with the GPU part being simpler while the CPU is more elaborate and latency aware. We achieve an average 45% speedup and 45% energy-delay product reduction (20% energy) over the corresponding SC implementation.

    Keywords
    Multicore; heterogeneous coherence; GPU MMU design; virtual coherence protocol; directory-less protocol
    National Category
    Computer Systems
    Identifiers
    urn:nbn:se:uu:diva-295765 (URN)10.1145/2889488 (DOI)000373904600001 ()
    Projects
    UPMARC
    Funder
    EU, FP7, Seventh Framework Programme, FP7-ICT-288653EU, European Research Council, TIN2012-38341-C04-03
    Available from: 2016-04-05 Created: 2016-06-09 Last updated: 2017-11-30Bibliographically approved
    Download full text (pdf)
    fulltext
    Download (jpg)
    preview image
  • 4.
    Koukos, Konstantinos
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Black-Schaffer, David
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Spiliopoulos, Vasileios
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Kaxiras, Stefanos
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Towards more efficient execution: a decoupled access-execute approach2013In: Proc. 27th ACM International Conference on Supercomputing, New York: ACM Press, 2013, p. 253-262Conference paper (Refereed)
    Abstract [en]

    The end of Dennard scaling is expected to shrink the range of DVFS in future nodes, limiting the energy savings of this technique. This paper evaluates how much we can increase the effectiveness of DVFS by using a software decoupled access-execute approach. Decoupling the data access from execution allows us to apply optimal voltage-frequency selection for each phase and therefore improve energy efficiency over standard coupled execution.

    The underlying insight of our work is that by decoupling access and execute we can take advantage of the memory-bound nature of the access phase and the compute-bound nature of the execute phase to optimize power efficiency, while maintaining good performance. To demonstrate this we built a task based parallel execution infrastructure consisting of: (1) a runtime system to orchestrate the execution, (2) power models to predict optimal voltage-frequency selection at runtime, (3) a modeling infrastructure based on hardware measurements to simulate zero-latency, per-core DVFS, and (4) a hardware measurement infrastructure to verify our model's accuracy.

    Based on real hardware measurements we project that the combination of decoupled access-execute and DVFS has the potential to improve EDP by 25% without hurting performance. On memory-bound applications we significantly improve performance due to increased MLP in the access phase and ILP in the execute phase. Furthermore we demonstrate that our method can achieve high performance both in presence or absence of a hardware prefetcher.

    Download full text (pdf)
    fulltext
  • 5.
    Koukos, Konstantinos
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Black-Schaffer, David
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Spiliopoulos, Vasileios
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Kaxiras, Stefanos
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Towards Power Efficiency on Task-Based, Decoupled Access-Execute Models2013In: PARMA 2013, 4th Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures, 2013Conference paper (Refereed)
    Abstract [en]

    This work demonstrates the potential of hardware and software optimization to improve theeffectiveness of dynamic voltage and frequency scaling (DVFS). For software, we decouple data prefetch (access) and computation (execute) to enable optimal DVFS selectionfor each phase. For hardware, we use measurements from state-of-the-art multicore processors to accurately model the potential of per-core, zero-latency DVFS. We demonstrate that the combinationof decoupled access-execute and precise DVFS has the potential to decrease EDP by 25-30% without reducing performance.

    The underlying insight in this work is that by decoupling access and execute we can take advantageof the memory-bound nature of the access phase and the compute-bound nature of the execute phase to optimize power efficiency. For the memory-bound access phase, where we prefetch data into the cachefrom main memory, we can run at a reduced frequency and voltage without hurting performance. Thereafter, the execute phase can run much faster, thanks to the prefetching of the access phase, and achieve higher performance. This decoupled program behavior allows us to achieve more effective use of DVFS than standard coupled executions which mix data access and compute.

    To understand the potential of this approach, we measure application performance and power consumption on a modern multicore system across a range of frequencies and voltages. From this data we build a model that allows us to analyze the effects of per-core, zero-latency DVFS. The results of this work demonstrate the significant potential for finer-grain DVFS in combination with DVFS-optimized software.

    Download full text (pdf)
    fulltext
  • 6.
    Koukos, Konstantinos
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Ekemark, Per
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology.
    Zacharopoulos, Georgios
    Switzerland Univ Svizzera Italiana, Lugano, Switzerland.
    Spiliopoulos, Vasileios
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Kaxiras, Stefanos
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Jimborean, Alexandra
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Multiversioned decoupled access-execute: The key to energy-efficient compilation of general-purpose programs2016In: Proc. 25th International Conference on Compiler Construction, New York: ACM Press, 2016, p. 121-131Conference paper (Refereed)
    Abstract [en]

    Computer architecture design faces an era of great challenges in an attempt to simultaneously improve performance and energy efficiency. Previous hardware techniques for energy management become severely limited, and thus, compilers play an essential role in matching the software to the more restricted hardware capabilities. One promising approach is software decoupled access-execute (DAE), in which the compiler transforms the code into coarse-grain phases that are well-matched to the Dynamic Voltage and Frequency Scaling (DVFS) capabilities of the hardware. While this method is proved efficient for statically analyzable codes, general purpose applications pose significant challenges due to pointer aliasing, complex control flow and unknown runtime events. We propose a universal compile-time method to decouple general-purpose applications, using simple but efficient heuristics. Our solutions overcome the challenges of complex code and show that automatic decoupled execution significantly reduces the energy expenditure of irregular or memory-bound applications and even yields slight performance boosts. Overall, our technique achieves over 20% on average energy-delay-product (EDP) improvements (energy over 15% and performance over 5%) across 14 bench-marks from SPEC CPU 2006 and Parboil benchmark suites, with peak EDP improvements surpassing 70%.

    Download full text (pdf)
    fulltext
  • 7.
    Koukos, Konstantinos
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Ros, Alberto
    Hagersten, Erik
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Kaxiras, Stefanos
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Building Heterogeneous Unified Virtual Memories (UVMs) without the Overhead2016In: ACM Transactions on Architecture and Code Optimization (TACO), ISSN 1544-3566, E-ISSN 1544-3973, Vol. 13, no 1, article id 1Article in journal (Refereed)
    Abstract [en]

    This work proposes a novel scheme to facilitate heterogeneous systems with unified virtual memory. Research proposals implement coherence protocols for sequential consistency (SC) between central processing unit (CPU) cores and between devices. Such mechanisms introduce severe bottlenecks in the system; therefore, we adopt the heterogeneous-race-free (HRF) memory model. The use of HRF simplifies the coherency protocol and the graphics processing unit (GPU) memory management unit (MMU). Our protocol optimizes CPU and GPU demands separately, with the GPU part being simpler while the CPU is more elaborate and latency aware. We achieve an average 45% speedup and 45% energy-delay product reduction (20% energy) over the corresponding SC implementation.

    Download full text (pdf)
    fulltext
  • 8.
    Tran, Kim-Anh
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Carlson, Trevor E.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Koukos, Konstantinos
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Själander, Magnus
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Spiliopoulos, Vasileios
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Kaxiras, Stefanos
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Jimborean, Alexandra
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Clairvoyance: Look-ahead compile-time scheduling2017In: Proc. 15th International Symposium on Code Generation and Optimization, Piscataway, NJ: IEEE Press, 2017, p. 171-184Conference paper (Refereed)
    Download full text (pdf)
    fulltext
  • 9.
    Tran, Kim-Anh
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Jimborean, Alexandra
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Carlson, Trevor E.
    National University of Singapore, Singapore.
    Koukos, Konstantinos
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Själander, Magnus
    NTNU, Norway.
    Kaxiras, Stefanos
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    SWOOP: software-hardware co-design for non-speculative, execute-ahead, in-order cores2018In: Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, Association for Computing Machinery (ACM), 2018, p. 328-343Conference paper (Refereed)
    Abstract [en]

    Increasing demands for energy efficiency constrain emerging hardware. These new hardware trends challenge the established assumptions in code generation and force us to rethink existing software optimization techniques. We propose a cross-layer redesign of the way compilers and the underlying microarchitecture are built and interact, to achieve both performance and high energy efficiency.

    In this paper, we address one of the main performance bottlenecks — last-level cache misses — through a software-hardware co-design. Our approach is able to hide memory latency and attain increased memory and instruction level parallelism by orchestrating a non-speculative, execute-ahead paradigm in software (SWOOP). While out-of-order (OoO) architectures attempt to hide memory latency by dynamically reordering instructions, they do so through expensive, power-hungry, speculative mechanisms.We aim to shift this complexity into software, and we build upon compilation techniques inherited from VLIW, software pipelining, modulo scheduling, decoupled access-execution, and software prefetching. In contrast to previous approaches we do not rely on either software or hardware speculation that can be detrimental to efficiency. Our SWOOP compiler is enhanced with lightweight architectural support, thus being able to transform applications that include highly complex control-flow and indirect memory accesses.

    Download full text (pdf)
    fulltext
  • 10. Waern, Jonatan
    et al.
    Ekemark, Per
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology.
    Koukos, Konstantinos
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Kaxiras, Stefanos
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Jimborean, Alexandra
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Profiling-Assisted Decoupled Access-Execute2016In: Proc. 4th International Workshop on High Performance Energy Efficient Embedded Systems, 2016Conference paper (Refereed)
1 - 10 of 10
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf