uu.seUppsala University Publications
Change search
Refine search result
1 - 21 of 21
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1.
    Carlson, Trevor E.
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Tran, Kim-Anh
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Jimborean, Alexandra
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Koukos, Konstantinos
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Själander, Magnus
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Kaxiras, Stefanos
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Transcending hardware limits with software out-of-order processing2017In: IEEE Computer Architecture Letters, ISSN 1556-6056, Vol. 16, no 2, p. 162-165Article in journal (Refereed)
  • 2. Cebrián, Juan M.
    et al.
    Fernández-Pascual, Ricardo
    Jimborean, Alexandra
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Acacio, Manuel E.
    Ros, Alberto
    A dedicated private-shared cache design for scalable multiprocessors2017In: Concurrency and Computation, ISSN 1532-0626, E-ISSN 1532-0634, Vol. 29, no 2, article id e3871Article in journal (Refereed)
  • 3. Clauss, Philippe
    et al.
    Fassi, Imen
    Jimborean, Alexandra
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Software-controlled processor stalls for time and energy efficient data locality optimization2014In: Proc. International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS XIV), Piscataway, NJ: IEEE , 2014, p. 199-206Conference paper (Refereed)
  • 4.
    Jimborean, Alexandra
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computing Science.
    Clauss, Philippe
    Dollinger, Jean-François
    Loechner, Vincent
    Martinez Caamaño, Juan Manuel
    Dynamic and speculative polyhedral parallelization of loop nests using binary code patterns2013In: ICCS 2013, 2013, p. 2575-2578Conference paper (Refereed)
    Abstract [en]

    Speculative parallelization is a classic strategy for automatically parallelizing codes that cannot be handled at compile-time due to the use ofdynamic data and control structures. Another motivation of being speculative is to adapt the code to the current execution context, by selecting at run-time an efficient parallel schedule. However, since this parallelization scheme requires on-the-fly semantics verification, it is in general difficult to perform advanced transformations for optimization and parallelism extraction. We propose a framework dedicated tospeculative parallelization of scientific nested loop kernels, able to transform thecode at runtime by re-scheduling the iterations to exhibit parallelism and data locality. The run-time process includes a transformation selection guided by profiling phases on short samples, using an instrumented version of the code. During this phase, the accessed memory addresses are interpolated to build a predictor of the forthcoming accesses. The collected addresses are also used to compute on-the-fly dependence distance vectors by tracking accesses to common addresses. Interpolating functions and distance vectors are then employed in dynamicdependence analysis and in selecting a parallelizing transformation that, if the prediction is correct, does not induce any rollback during execution. In order to ensure that the rollback time overhead stays low, the code is executed in successive slices of the outermost original loop of the nest. Each slice can be either a parallelized version, a sequential original version, or an instrumented version. Moreover, such slicing of the execution provides the opportunity of transforming differently the code to adapt to the observed execution phases. Parallel codegeneration is achieved almost at no cost by using binary code patterns that are generated at compile-time and that are simply patched at run-time to result in the transformed code. The framework has been implemented with extensions of the LLVM compiler and an x86-64 runtime system. Significant speed-ups are shown on a set of benchmarks that could not have been handled efficiently by a compiler.

  • 5.
    Jimborean, Alexandra
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computing Science.
    Clauss, Philippe
    Dollinger, Jean-François
    Loechner, Vincent
    Martinez Caamaño, Juan Manuel
    Dynamic and speculative polyhedral parallelization using compiler-generated skeletons2014In: International journal of parallel programming, ISSN 0885-7458, E-ISSN 1573-7640, Vol. 42, no 4, p. 529-545Article in journal (Refereed)
  • 6.
    Jimborean, Alexandra
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computing Science.
    Clauss, Philippe
    Martinez Caamaño, Juan Manuel
    Sukumaran-Rajam, Aravind
    Online dynamic dependence analysis for speculative polyhedral parallelization2013In: Euro-Par 2013 Parallel Processing, Springer Berlin/Heidelberg, 2013, p. 191-202Conference paper (Refereed)
    Abstract [en]

    We present a dynamic dependence analyzer whose goal is to compute dependencies from instrumented execution samples of loop nests. The resulting information serves as a prediction of the execution behavior during the remaining iterations and can be used to select and apply a speculatively optimizing and parallelizing polyhedral transformation of the target sequential loop nest. Thus, a parallel lock-free version can be generated which should not induce any rollback if the prediction is correct. The dependence analyzer computes distance vectors and linear functions interpolating the memory addresses accessed by each memory instruction, and the values of some scalars. Phases showing a changing memory behavior are detected thanks to a dynamic adjustment of the instrumentation frequency.

    The dependence analyzer takes part of a whole framework dedicated to speculative parallelization of loop nests which has been implemented with extensions of the LLVM compiler and an x86-64 runtime system.

  • 7.
    Jimborean, Alexandra
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Ekemark, Per
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Waern, Jonatan
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology.
    Kaxiras, Stefanos
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Ros, Alberto
    Univ Murcia, Comp Engn Dept, E-30100 Murcia, Spain..
    Automatic Detection of Large Extended Data-Race-Free Regions with Conflict Isolation2018In: IEEE Transactions on Parallel and Distributed Systems, ISSN 1045-9219, E-ISSN 1558-2183, Vol. 29, no 3, p. 527-541Article in journal (Refereed)
    Abstract [en]

    Data-race-free (DRF) parallel programming becomes a standard as newly adopted memory models of mainstream programming languages such as C++ or Java impose data-race-freedom as a requirement. We propose compiler techniques that automatically delineate extended data-race-free (xDRF) regions, namely regions of code that provide the same guarantees as the synchronization-free regions (in the context of DRF codes). xDRF regions stretch across synchronization boundaries, function calls and loop back-edges and preserve the data-race-free semantics, thus increasing the optimization opportunities exposed to the compiler and to the underlying architecture. We further enlarge xDRF regions with a conflict isolation (CI) technique, delineating what we call xDRF-CI regions while preserving the same properties as xDRF regions. Our compiler (1) precisely analyzes the threads' memory accessing behavior and data sharing in shared-memory, general-purpose parallel applications, (2) isolates data-sharing and (3) marks the limits of xDRF-CI code regions. The contribution of this work consists in a simple but effective method to alleviate the drawbacks of the compiler's conservative nature in order to be competitive with (and even surpass) an expert in delineating xDRF regions manually. We evaluate the potential of our technique by employing xDRF and xDRF-CI region classification in a state-of-the-art, dual-mode cache coherence protocol. We show that xDRF regions reduce the coherence bookkeeping and enable optimizations for performance (6.4 percent) and energy efficiency (12.2 percent) compared to a standard directory-based coherence protocol. Enhancing the xDRF analysis with the conflict isolation technique improves performance by 7.1 percent and energy efficiency by 15.9 percent.

  • 8.
    Jimborean, Alexandra
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computing Science.
    Koukos, Konstantinos
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Spiliopoulos, Vasileios
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Black-Schaffer, David
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Kaxiras, Stefanos
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Fix the code. Don't tweak the hardware: A new compiler approach to Voltage–Frequency scaling2014In: Proc. 12th International Symposium on Code Generation and Optimization, New York: ACM Press, 2014, p. 262-272Conference paper (Refereed)
  • 9.
    Jimborean, Alexandra
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Waern, Jonatan
    Ekemark, Per
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Kaxiras, Stefanos
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Ros, Alberto
    Automatic detection of extended data-race-free regions2017In: Proc. 15th International Symposium on Code Generation and Optimization, Piscataway, NJ: IEEE Press, 2017, p. 14-26Conference paper (Refereed)
  • 10.
    Koukos, Konstantinos
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Ekemark, Per
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology.
    Zacharopoulos, Georgios
    Switzerland Univ Svizzera Italiana, Lugano, Switzerland.
    Spiliopoulos, Vasileios
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Kaxiras, Stefanos
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Jimborean, Alexandra
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Multiversioned decoupled access-execute: The key to energy-efficient compilation of general-purpose programs2016In: Proc. 25th International Conference on Compiler Construction, New York: ACM Press, 2016, p. 121-131Conference paper (Refereed)
    Abstract [en]

    Computer architecture design faces an era of great challenges in an attempt to simultaneously improve performance and energy efficiency. Previous hardware techniques for energy management become severely limited, and thus, compilers play an essential role in matching the software to the more restricted hardware capabilities. One promising approach is software decoupled access-execute (DAE), in which the compiler transforms the code into coarse-grain phases that are well-matched to the Dynamic Voltage and Frequency Scaling (DVFS) capabilities of the hardware. While this method is proved efficient for statically analyzable codes, general purpose applications pose significant challenges due to pointer aliasing, complex control flow and unknown runtime events. We propose a universal compile-time method to decouple general-purpose applications, using simple but efficient heuristics. Our solutions overcome the challenges of complex code and show that automatic decoupled execution significantly reduces the energy expenditure of irregular or memory-bound applications and even yields slight performance boosts. Overall, our technique achieves over 20% on average energy-delay-product (EDP) improvements (energy over 15% and performance over 5%) across 14 bench-marks from SPEC CPU 2006 and Parboil benchmark suites, with peak EDP improvements surpassing 70%.

  • 11.
    Ros, Alberto
    et al.
    Univ Murcia, E-30001 Murcia, Spain.
    Jimborean, Alexandra
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    A dual-consistency cache coherence protocol2015In: Proc. 29th International Parallel and Distributed Processing Symposium, Los Alamitos, CA: IEEE Computer Society, 2015, p. 1119-1128Conference paper (Refereed)
    Abstract [en]

    Weak memory consistency models can maximize system performance by enabling hardware and compiler optimizations, but increase programming complexity since they do not match programmers’ intuition. The design of an efficient system with an intuitive memory model is an open challenge.

    This paper proposes SPEL, a dual-consistency cache coherence protocol which simultaneously guarantees the strongest memory consistency model provided by the hardware and yields improvements in both performance and energy consumption. The design of the protocol exploits a compile-time identification of code regions which can be executed under a less restrictive, thus optimized protocol, without harming correctness. Outside these regions, code is executed under a more restrictive protocol which enforces sequential consistency. Compared to a standard directory protocol, we show improvements in performance of 24% and reductions in energy consumption of 32%, on average, for a 64-core chip multiprocessor.

  • 12.
    Ros, Alberto
    et al.
    Univ Murcia, Dept Comp Engn, E-30100 Murcia, Spain.
    Jimborean, Alexandra
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    A hybrid static–dynamic classification for dual-consistency cache coherence2016In: IEEE Transactions on Parallel and Distributed Systems, ISSN 1045-9219, E-ISSN 1558-2183, Vol. 27, no 11, p. 3101-3115Article in journal (Refereed)
    Abstract [en]

    Traditional cache coherence protocols manage all memory accesses equally and ensure the strongest memory model, namely, sequential consistency. Recent cache coherence protocols based on self-invalidation advocate for the model sequential consistency for data-race-free, which enables powerful optimizations for race-free code. However, for racy code these cache coherence protocols provide sub-optimal performance compared to traditional protocols. This paper proposes SPEL++, a dual-consistency cache coherence protocol that supports two execution modes: a traditional sequential-consistent protocol and a protocol that provides weak consistency (or sequential consistency for data-race-free). SPEL++ exploits a static-dynamic hybrid classification of memory accesses based on (i) a compile-time identification of extended data-race-free code regions for OpenMP applications and (ii) a runtime classification of accesses based on the operating system's memory page management. By executing racy code under the sequential-consistent protocol and race-free code under the cache coherence protocol that provides sequential consistency for data-race-free, the end result is an efficient execution of the applications while still providing sequential consistency. Compared to a traditional protocol, we show improvements in performance from 19 to 38 percent and reductions in energy consumption from 47 to 53 percent, on average for different benchmark suites, on a 64-core chip multiprocessor.

  • 13.
    Sakalis, Christos
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Alipour, Mehdi
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Ros, Alberto
    Jimborean, Alexandra
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Kaxiras, Stefanos
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Själander, Magnus
    Norwegian University of Science and Technology, Trondheim, Norway.
    Ghost Loads: What is the cost of invisible speculation?2019In: Proceedings of the 16th ACM International Conference on Computing Frontiers, New York: ACM Press, 2019, p. 153-163Conference paper (Refereed)
    Abstract [en]

    Speculative execution is necessary for achieving high performance on modern general-purpose CPUs but, starting with Spectre and Meltdown, it has also been proven to cause severe security flaws. In case of a misspeculation, the architectural state is restored to assure functional correctness but a multitude of microarchitectural changes (e.g., cache updates), caused by the speculatively executed instructions, are commonly left in the system.  These changes can be used to leak sensitive information, which has led to a frantic search for solutions that can eliminate such security flaws. The contribution of this work is an evaluation of the cost of hiding speculative side-effects in the cache hierarchy, making them visible only after the speculation has been resolved. For this, we compare (for the first time) two broad approaches: i) waiting for loads to become non-speculative before issuing them to the memory system, and ii) eliminating the side-effects of speculation, a solution consisting of invisible loads (Ghost loads) and performance optimizations (Ghost Buffer and Materialization). While previous work, InvisiSpec, has proposed a similar solution to our latter approach, it has done so with only a minimal evaluation and at a significant performance cost. The detailed evaluation of our solutions shows that: i) waiting for loads to become non-speculative is no more costly than the previously proposed InvisiSpec solution, albeit much simpler, non-invasive in the memory system, and stronger security-wise; ii) hiding speculation with Ghost loads (in the context of a relaxed memory model) can be achieved at the cost of 12% performance degradation and 9% energy increase, which is significantly better that the previous state-of-the-art solution.

  • 14.
    Sakalis, Christos
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Kaxiras, Stefanos
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Ros, Alberto
    University of Murcia, Murcia, Spain.
    Jimborean, Alexandra
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Själander, Magnus
    Norwegian University of Science and Technology, Trondheim, Norway.
    Efficient invisible speculative execution through selective delay and value prediction2019In: Proc. 46th International Symposium on Computer Architecture, New York: ACM Press, 2019, p. 723-735Conference paper (Refereed)
    Abstract [en]

    Speculative execution, the base on which modern high-performance general-purpose CPUs are built on, has recently been shown to enable a slew of security attacks.  All these attacks are centered around a common set of behaviors: During speculative execution, the architectural state of the system is kept unmodified, until the speculation can be verified.  In the event that a misspeculation occurs, then anything that can affect the architectural state is reverted (squashed) and re-executed correctly.  However, the same is not true for the microarchitectural state.  Normally invisible to the user, changes to the microarchitectural state can be observed through various side-channels, with timing differences caused by the memory hierarchy being one of the most common and easy to exploit.  The speculative side-channels can then be exploited to perform attacks that can bypass software and hardware checks in order to leak information.  These attacks, out of which the most infamous are perhaps Spectre and Meltdown, have led to a frantic search for solutions.In this work, we present our own solution for reducing the microarchitectural state-changes caused by speculative execution in the memory hierarchy.  It is based on the observation that if we only allow accesses that hit in the L1 data cache to proceed, then we can easily hide any microarchitectural changes until after the speculation has been verified.  At the same time, we propose to prevent stalls by value predicting the loads that miss in the L1.  Value prediction, though speculative, constitutes an invisible form of speculation, not seen outside the core.  We evaluate our solution and show that we can prevent observable microarchitectural changes in the memory hierarchy while keeping the performance and energy costs at 11% and 7%, respectively.  In comparison, the current state of the art solution, InvisiSpec, incurs a 46% performance loss and a 51% energy increase.

  • 15. Sukumaran-Rajam, Aravind
    et al.
    Martinez Caamaño, Juan Manuel
    Wolff, Willy
    Jimborean, Alexandra
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Clauss, Philippe
    Speculative program parallelization with scalable and decentralized runtime verification2014In: Runtime Verification, Springer Berlin/Heidelberg, 2014, p. 124-139Conference paper (Refereed)
    Abstract [en]

    Thread Level Speculation (TLS) is a dynamic code parallelization technique proposed to keep the software in pace with the advances in hardware,in particular, to automatically parallelize programs to take advantage of the multi-core processors. Being speculative, frameworks of this type unavoidably rely on verification systems that are similar to software transactional memory, and that require voluminous inter-thread communications or centralized registering of the performed memory accesses. The high degree of communication is against the basic principles of high performance parallel computing, does not scale with an increasing number of processor cores, and yields weak performance. Moreover, TLS systems often apply one unique parallelization strategy consisting in slicing a loop into several parallel speculative threads. Such a strategy is also against the basic principles since loops in the original serial code are not necessarily parallel and also, it is well-known that the parallel schedule must promote data locality which is crucial in obtaining good performance. This situation appeals to scalable and decentralized verification systems and new strategies to dynamically generate efficient parallel code resulting from advanced optimizing parallelizing transformations. Such transformations require a more complex verification system that allows intra-thread iterations to be reordered. In this paper, we propose a verification system of this kind, based on a model built at runtime and predicting a linear memory behavior. This strategy is part of the Apollo speculative code parallelizer which is based on an adaptation for dynamic usage of the polyhedral model.

  • 16.
    Tran, Kim-Anh
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Carlson, Trevor E.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Koukos, Konstantinos
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Själander, Magnus
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Spiliopoulos, Vasileios
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Kaxiras, Stefanos
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Jimborean, Alexandra
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Clairvoyance: Look-ahead compile-time scheduling2017In: Proc. 15th International Symposium on Code Generation and Optimization, Piscataway, NJ: IEEE Press, 2017, p. 171-184Conference paper (Refereed)
  • 17.
    Tran, Kim-Anh
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Carlson, Trevor E.
    Koukos, Konstantinos
    Själander, Magnus
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication. Norwegian Univ Sci & Technol, S-75236 Uppsala, Sweden.
    Spiliopoulos, Vasileios
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Kaxiras, Stefanos
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Jimborean, Alexandra
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Static instruction scheduling for high performance on limited hardware2018In: IEEE Transactions on Computers, ISSN 0018-9340, Vol. 67, no 4, p. 513-527Article in journal (Refereed)
    Abstract [en]

    Complex out-of-order (OoO) processors have been designed to overcome the restrictions of outstanding long-latency misses at the cost of increased energy consumption. Simple, limited OoO processors are a compromise in terms of energy consumption and performance, as they have fewer hardware resources to tolerate the penalties of long-latency loads. In worst case, these loads may stall the processor entirely. We present Clairvoyance, a compiler based technique that generates code able to hide memory latency and better utilize simple OoO processors. By clustering loads found across basic block boundaries, Clairvoyance overlaps the outstanding latencies to increases memory-level parallelism. We show that these simple OoO processors, equipped with the appropriate compiler support, can effectively hide long-latency loads and achieve performance improvements for memory-bound applications. To this end, Clairvoyance tackles (i) statically unknown dependencies, (ii) insufficient independent instructions, and (iii) register pressure. Clairvoyance achieves a geomean execution time improvement of 14 percent for memory-bound applications, on top of standard O3 optimizations, while maintaining compute-bound applications' high-performance.

  • 18.
    Tran, Kim-Anh
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Jimborean, Alexandra
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Carlson, Trevor E.
    National University of Singapore, Singapore.
    Koukos, Konstantinos
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Själander, Magnus
    NTNU, Norway.
    Kaxiras, Stefanos
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    SWOOP: software-hardware co-design for non-speculative, execute-ahead, in-order cores2018In: Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, Association for Computing Machinery (ACM), 2018, p. 328-343Conference paper (Refereed)
    Abstract [en]

    Increasing demands for energy efficiency constrain emerging hardware. These new hardware trends challenge the established assumptions in code generation and force us to rethink existing software optimization techniques. We propose a cross-layer redesign of the way compilers and the underlying microarchitecture are built and interact, to achieve both performance and high energy efficiency.

    In this paper, we address one of the main performance bottlenecks — last-level cache misses — through a software-hardware co-design. Our approach is able to hide memory latency and attain increased memory and instruction level parallelism by orchestrating a non-speculative, execute-ahead paradigm in software (SWOOP). While out-of-order (OoO) architectures attempt to hide memory latency by dynamically reordering instructions, they do so through expensive, power-hungry, speculative mechanisms.We aim to shift this complexity into software, and we build upon compilation techniques inherited from VLIW, software pipelining, modulo scheduling, decoupled access-execution, and software prefetching. In contrast to previous approaches we do not rely on either software or hardware speculation that can be detrimental to efficiency. Our SWOOP compiler is enhanced with lightweight architectural support, thus being able to transform applications that include highly complex control-flow and indirect memory accesses.

  • 19.
    Voigt, Thiemo
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Själander, Magnus
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Hermans, Frederik
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Jimborean, Alexandra
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Hagersten, Erik
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Gunningberg, Per
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Kaxiras, Stefanos
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Approximation: A New Paradigm also for Wireless Sensing2016Conference paper (Refereed)
  • 20. Waern, Jonatan
    et al.
    Ekemark, Per
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology.
    Koukos, Konstantinos
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Kaxiras, Stefanos
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Jimborean, Alexandra
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Profiling-Assisted Decoupled Access-Execute2016In: Proc. 4th International Workshop on High Performance Energy Efficient Embedded Systems, 2016Conference paper (Refereed)
  • 21. Weber, Anton
    et al.
    Tran, Kim-Anh
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Kaxiras, Stefanos
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Jimborean, Alexandra
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Decoupled Access-Execute on ARM big.LITTLE2017In: Proc. 5th Workshop on High Performance Energy Efficient Embedded Systems, 2017Conference paper (Refereed)
1 - 21 of 21
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf