uu.seUppsala universitets publikasjoner
Endre søk
Link to record
Permanent link

Direct link
BETA
Jimborean, Alexandra
Publikasjoner (10 av 21) Visa alla publikasjoner
Sakalis, C., Kaxiras, S., Ros, A., Jimborean, A. & Själander, M. (2019). Efficient invisible speculative execution through selective delay and value prediction. In: Proc. 46th International Symposium on Computer Architecture: . Paper presented at ISCA 2019, June 22–26, Phoenix, AZ, USA (pp. 723-735). New York: ACM Press
Åpne denne publikasjonen i ny fane eller vindu >>Efficient invisible speculative execution through selective delay and value prediction
Vise andre…
2019 (engelsk)Inngår i: Proc. 46th International Symposium on Computer Architecture, New York: ACM Press, 2019, s. 723-735Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

Speculative execution, the base on which modern high-performance general-purpose CPUs are built on, has recently been shown to enable a slew of security attacks.  All these attacks are centered around a common set of behaviors: During speculative execution, the architectural state of the system is kept unmodified, until the speculation can be verified.  In the event that a misspeculation occurs, then anything that can affect the architectural state is reverted (squashed) and re-executed correctly.  However, the same is not true for the microarchitectural state.  Normally invisible to the user, changes to the microarchitectural state can be observed through various side-channels, with timing differences caused by the memory hierarchy being one of the most common and easy to exploit.  The speculative side-channels can then be exploited to perform attacks that can bypass software and hardware checks in order to leak information.  These attacks, out of which the most infamous are perhaps Spectre and Meltdown, have led to a frantic search for solutions.In this work, we present our own solution for reducing the microarchitectural state-changes caused by speculative execution in the memory hierarchy.  It is based on the observation that if we only allow accesses that hit in the L1 data cache to proceed, then we can easily hide any microarchitectural changes until after the speculation has been verified.  At the same time, we propose to prevent stalls by value predicting the loads that miss in the L1.  Value prediction, though speculative, constitutes an invisible form of speculation, not seen outside the core.  We evaluate our solution and show that we can prevent observable microarchitectural changes in the memory hierarchy while keeping the performance and energy costs at 11% and 7%, respectively.  In comparison, the current state of the art solution, InvisiSpec, incurs a 46% performance loss and a 51% energy increase.

sted, utgiver, år, opplag, sider
New York: ACM Press, 2019
Emneord
caches, side-channel attacks, speculative execution
HSV kategori
Identifikatorer
urn:nbn:se:uu:diva-387329 (URN)10.1145/3307650.3322216 (DOI)978-1-4503-6669-4 (ISBN)
Konferanse
ISCA 2019, June 22–26, Phoenix, AZ, USA
Forskningsfinansiär
Swedish Research Council, 2015-05159Swedish Foundation for Strategic Research , SM17-0064
Merknad

Tilgjengelig fra: 2019-06-22 Laget: 2019-06-21 Sist oppdatert: 2019-08-27bibliografisk kontrollert
Sakalis, C., Alipour, M., Ros, A., Jimborean, A., Kaxiras, S. & Själander, M. (2019). Ghost Loads: What is the cost of invisible speculation?. In: Proceedings of the 16th ACM International Conference on Computing Frontiers: . Paper presented at CF 2019, April 30 – May 2, Alghero, Sardinia, Italy (pp. 153-163). New York: ACM Press
Åpne denne publikasjonen i ny fane eller vindu >>Ghost Loads: What is the cost of invisible speculation?
Vise andre…
2019 (engelsk)Inngår i: Proceedings of the 16th ACM International Conference on Computing Frontiers, New York: ACM Press, 2019, s. 153-163Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

Speculative execution is necessary for achieving high performance on modern general-purpose CPUs but, starting with Spectre and Meltdown, it has also been proven to cause severe security flaws. In case of a misspeculation, the architectural state is restored to assure functional correctness but a multitude of microarchitectural changes (e.g., cache updates), caused by the speculatively executed instructions, are commonly left in the system.  These changes can be used to leak sensitive information, which has led to a frantic search for solutions that can eliminate such security flaws. The contribution of this work is an evaluation of the cost of hiding speculative side-effects in the cache hierarchy, making them visible only after the speculation has been resolved. For this, we compare (for the first time) two broad approaches: i) waiting for loads to become non-speculative before issuing them to the memory system, and ii) eliminating the side-effects of speculation, a solution consisting of invisible loads (Ghost loads) and performance optimizations (Ghost Buffer and Materialization). While previous work, InvisiSpec, has proposed a similar solution to our latter approach, it has done so with only a minimal evaluation and at a significant performance cost. The detailed evaluation of our solutions shows that: i) waiting for loads to become non-speculative is no more costly than the previously proposed InvisiSpec solution, albeit much simpler, non-invasive in the memory system, and stronger security-wise; ii) hiding speculation with Ghost loads (in the context of a relaxed memory model) can be achieved at the cost of 12% performance degradation and 9% energy increase, which is significantly better that the previous state-of-the-art solution.

sted, utgiver, år, opplag, sider
New York: ACM Press, 2019
Emneord
speculation, security, side-channel attacks, caches
HSV kategori
Identifikatorer
urn:nbn:se:uu:diva-383173 (URN)10.1145/3310273.3321558 (DOI)000474686400019 ()978-1-4503-6685-4 (ISBN)
Konferanse
CF 2019, April 30 – May 2, Alghero, Sardinia, Italy
Forskningsfinansiär
Swedish Research Council, 2015-05159Swedish National Infrastructure for Computing (SNIC)
Merknad

Tilgjengelig fra: 2019-05-10 Laget: 2019-05-10 Sist oppdatert: 2019-08-23bibliografisk kontrollert
Jimborean, A., Ekemark, P., Waern, J., Kaxiras, S. & Ros, A. (2018). Automatic Detection of Large Extended Data-Race-Free Regions with Conflict Isolation. IEEE Transactions on Parallel and Distributed Systems, 29(3), 527-541
Åpne denne publikasjonen i ny fane eller vindu >>Automatic Detection of Large Extended Data-Race-Free Regions with Conflict Isolation
Vise andre…
2018 (engelsk)Inngår i: IEEE Transactions on Parallel and Distributed Systems, ISSN 1045-9219, E-ISSN 1558-2183, Vol. 29, nr 3, s. 527-541Artikkel i tidsskrift (Fagfellevurdert) Published
Abstract [en]

Data-race-free (DRF) parallel programming becomes a standard as newly adopted memory models of mainstream programming languages such as C++ or Java impose data-race-freedom as a requirement. We propose compiler techniques that automatically delineate extended data-race-free (xDRF) regions, namely regions of code that provide the same guarantees as the synchronization-free regions (in the context of DRF codes). xDRF regions stretch across synchronization boundaries, function calls and loop back-edges and preserve the data-race-free semantics, thus increasing the optimization opportunities exposed to the compiler and to the underlying architecture. We further enlarge xDRF regions with a conflict isolation (CI) technique, delineating what we call xDRF-CI regions while preserving the same properties as xDRF regions. Our compiler (1) precisely analyzes the threads' memory accessing behavior and data sharing in shared-memory, general-purpose parallel applications, (2) isolates data-sharing and (3) marks the limits of xDRF-CI code regions. The contribution of this work consists in a simple but effective method to alleviate the drawbacks of the compiler's conservative nature in order to be competitive with (and even surpass) an expert in delineating xDRF regions manually. We evaluate the potential of our technique by employing xDRF and xDRF-CI region classification in a state-of-the-art, dual-mode cache coherence protocol. We show that xDRF regions reduce the coherence bookkeeping and enable optimizations for performance (6.4 percent) and energy efficiency (12.2 percent) compared to a standard directory-based coherence protocol. Enhancing the xDRF analysis with the conflict isolation technique improves performance by 7.1 percent and energy efficiency by 15.9 percent.

sted, utgiver, år, opplag, sider
IEEE COMPUTER SOC, 2018
Emneord
Compile-time analysis, inter-procedural analysis, inter-thread analysis, data sharing, data races, cache coherence
HSV kategori
Identifikatorer
urn:nbn:se:uu:diva-348845 (URN)10.1109/TPDS.2017.2771509 (DOI)000425173200004 ()
Forskningsfinansiär
Swedish Research Council, 2016-05086
Tilgjengelig fra: 2018-04-25 Laget: 2018-04-25 Sist oppdatert: 2018-12-03bibliografisk kontrollert
Tran, K.-A., Carlson, T. E., Koukos, K., Själander, M., Spiliopoulos, V., Kaxiras, S. & Jimborean, A. (2018). Static instruction scheduling for high performance on limited hardware. IEEE Transactions on Computers, 67(4), 513-527
Åpne denne publikasjonen i ny fane eller vindu >>Static instruction scheduling for high performance on limited hardware
Vise andre…
2018 (engelsk)Inngår i: IEEE Transactions on Computers, ISSN 0018-9340, Vol. 67, nr 4, s. 513-527Artikkel i tidsskrift (Fagfellevurdert) Published
Abstract [en]

Complex out-of-order (OoO) processors have been designed to overcome the restrictions of outstanding long-latency misses at the cost of increased energy consumption. Simple, limited OoO processors are a compromise in terms of energy consumption and performance, as they have fewer hardware resources to tolerate the penalties of long-latency loads. In worst case, these loads may stall the processor entirely. We present Clairvoyance, a compiler based technique that generates code able to hide memory latency and better utilize simple OoO processors. By clustering loads found across basic block boundaries, Clairvoyance overlaps the outstanding latencies to increases memory-level parallelism. We show that these simple OoO processors, equipped with the appropriate compiler support, can effectively hide long-latency loads and achieve performance improvements for memory-bound applications. To this end, Clairvoyance tackles (i) statically unknown dependencies, (ii) insufficient independent instructions, and (iii) register pressure. Clairvoyance achieves a geomean execution time improvement of 14 percent for memory-bound applications, on top of standard O3 optimizations, while maintaining compute-bound applications' high-performance.

HSV kategori
Identifikatorer
urn:nbn:se:uu:diva-334011 (URN)10.1109/TC.2017.2769641 (DOI)000427420800005 ()
Prosjekter
UPMARC
Forskningsfinansiär
Swedish Research Council, 2016-05086
Tilgjengelig fra: 2017-11-03 Laget: 2017-11-20 Sist oppdatert: 2018-05-17bibliografisk kontrollert
Tran, K.-A., Jimborean, A., Carlson, T. E., Koukos, K., Själander, M. & Kaxiras, S. (2018). SWOOP: software-hardware co-design for non-speculative, execute-ahead, in-order cores. In: Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation: . Paper presented at PLDI 2018 the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, June 18-22 2018, Philadelphia, USA (pp. 328-343). Association for Computing Machinery (ACM)
Åpne denne publikasjonen i ny fane eller vindu >>SWOOP: software-hardware co-design for non-speculative, execute-ahead, in-order cores
Vise andre…
2018 (engelsk)Inngår i: Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, Association for Computing Machinery (ACM), 2018, s. 328-343Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

Increasing demands for energy efficiency constrain emerging hardware. These new hardware trends challenge the established assumptions in code generation and force us to rethink existing software optimization techniques. We propose a cross-layer redesign of the way compilers and the underlying microarchitecture are built and interact, to achieve both performance and high energy efficiency.

In this paper, we address one of the main performance bottlenecks — last-level cache misses — through a software-hardware co-design. Our approach is able to hide memory latency and attain increased memory and instruction level parallelism by orchestrating a non-speculative, execute-ahead paradigm in software (SWOOP). While out-of-order (OoO) architectures attempt to hide memory latency by dynamically reordering instructions, they do so through expensive, power-hungry, speculative mechanisms.We aim to shift this complexity into software, and we build upon compilation techniques inherited from VLIW, software pipelining, modulo scheduling, decoupled access-execution, and software prefetching. In contrast to previous approaches we do not rely on either software or hardware speculation that can be detrimental to efficiency. Our SWOOP compiler is enhanced with lightweight architectural support, thus being able to transform applications that include highly complex control-flow and indirect memory accesses.

sted, utgiver, år, opplag, sider
Association for Computing Machinery (ACM), 2018
HSV kategori
Identifikatorer
urn:nbn:se:uu:diva-361359 (URN)10.1145/3192366.3192393 (DOI)000452469600023 ()978-1-4503-5698-5 (ISBN)
Konferanse
PLDI 2018 the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, June 18-22 2018, Philadelphia, USA
Prosjekter
UPMARC
Forskningsfinansiär
Swedish Research Council, 2016-05086
Tilgjengelig fra: 2018-09-23 Laget: 2018-09-23 Sist oppdatert: 2019-02-01bibliografisk kontrollert
Cebrián, J. M., Fernández-Pascual, R., Jimborean, A., Acacio, M. E. & Ros, A. (2017). A dedicated private-shared cache design for scalable multiprocessors. Concurrency and Computation, 29(2), Article ID e3871.
Åpne denne publikasjonen i ny fane eller vindu >>A dedicated private-shared cache design for scalable multiprocessors
Vise andre…
2017 (engelsk)Inngår i: Concurrency and Computation, ISSN 1532-0626, E-ISSN 1532-0634, Vol. 29, nr 2, artikkel-id e3871Artikkel i tidsskrift (Fagfellevurdert) Published
HSV kategori
Identifikatorer
urn:nbn:se:uu:diva-315834 (URN)10.1002/cpe.3871 (DOI)000391940500009 ()
Prosjekter
UPMARC
Tilgjengelig fra: 2016-05-12 Laget: 2017-02-21 Sist oppdatert: 2018-01-13bibliografisk kontrollert
Jimborean, A., Waern, J., Ekemark, P., Kaxiras, S. & Ros, A. (2017). Automatic detection of extended data-race-free regions. In: Proc. 15th International Symposium on Code Generation and Optimization: . Paper presented at CGO 2017, February 4–8, Austin, TX (pp. 14-26). Piscataway, NJ: IEEE Press
Åpne denne publikasjonen i ny fane eller vindu >>Automatic detection of extended data-race-free regions
Vise andre…
2017 (engelsk)Inngår i: Proc. 15th International Symposium on Code Generation and Optimization, Piscataway, NJ: IEEE Press, 2017, s. 14-26Konferansepaper, Publicerat paper (Fagfellevurdert)
sted, utgiver, år, opplag, sider
Piscataway, NJ: IEEE Press, 2017
HSV kategori
Identifikatorer
urn:nbn:se:uu:diva-316826 (URN)000402548700002 ()978-1-5090-4931-8 (ISBN)
Konferanse
CGO 2017, February 4–8, Austin, TX
Prosjekter
UPMARC
Tilgjengelig fra: 2017-02-04 Laget: 2017-03-07 Sist oppdatert: 2018-01-13bibliografisk kontrollert
Tran, K.-A., Carlson, T. E., Koukos, K., Själander, M., Spiliopoulos, V., Kaxiras, S. & Jimborean, A. (2017). Clairvoyance: Look-ahead compile-time scheduling. In: Proc. 15th International Symposium on Code Generation and Optimization: . Paper presented at CGO 2017, February 4–8, Austin, TX (pp. 171-184). Piscataway, NJ: IEEE Press
Åpne denne publikasjonen i ny fane eller vindu >>Clairvoyance: Look-ahead compile-time scheduling
Vise andre…
2017 (engelsk)Inngår i: Proc. 15th International Symposium on Code Generation and Optimization, Piscataway, NJ: IEEE Press, 2017, s. 171-184Konferansepaper, Publicerat paper (Fagfellevurdert)
sted, utgiver, år, opplag, sider
Piscataway, NJ: IEEE Press, 2017
HSV kategori
Identifikatorer
urn:nbn:se:uu:diva-316480 (URN)000402548700015 ()978-1-5090-4931-8 (ISBN)
Konferanse
CGO 2017, February 4–8, Austin, TX
Prosjekter
UPMARC
Forskningsfinansiär
Swedish Research Council, 2010-4741
Tilgjengelig fra: 2017-02-04 Laget: 2017-03-01 Sist oppdatert: 2018-04-26bibliografisk kontrollert
Weber, A., Tran, K.-A., Kaxiras, S. & Jimborean, A. (2017). Decoupled Access-Execute on ARM big.LITTLE. In: Proc. 5th Workshop on High Performance Energy Efficient Embedded Systems: . Paper presented at HIP3ES 2017, January 25, Stockholm, Sweden.
Åpne denne publikasjonen i ny fane eller vindu >>Decoupled Access-Execute on ARM big.LITTLE
2017 (engelsk)Inngår i: Proc. 5th Workshop on High Performance Energy Efficient Embedded Systems, 2017Konferansepaper, Publicerat paper (Fagfellevurdert)
HSV kategori
Identifikatorer
urn:nbn:se:uu:diva-316482 (URN)
Konferanse
HIP3ES 2017, January 25, Stockholm, Sweden
Prosjekter
UPMARC
Forskningsfinansiär
Swedish Research Council, 2010-4741
Tilgjengelig fra: 2017-01-13 Laget: 2017-03-01 Sist oppdatert: 2018-01-13
Carlson, T. E., Tran, K.-A., Jimborean, A., Koukos, K., Själander, M. & Kaxiras, S. (2017). Transcending hardware limits with software out-of-order processing. IEEE Computer Architecture Letters, 16(2), 162-165
Åpne denne publikasjonen i ny fane eller vindu >>Transcending hardware limits with software out-of-order processing
Vise andre…
2017 (engelsk)Inngår i: IEEE Computer Architecture Letters, ISSN 1556-6056, Vol. 16, nr 2, s. 162-165Artikkel i tidsskrift (Fagfellevurdert) Published
HSV kategori
Identifikatorer
urn:nbn:se:uu:diva-334012 (URN)10.1109/LCA.2017.2672559 (DOI)000418870500018 ()
Prosjekter
UPMARC
Tilgjengelig fra: 2017-02-22 Laget: 2017-11-20 Sist oppdatert: 2018-04-26bibliografisk kontrollert
Organisasjoner