Logo: to the web site of Uppsala University

uu.sePublications from Uppsala University
Change search
Link to record
Permanent link

Direct link
Publications (10 of 23) Show all publications
Sakalis, C., Chowdhury, Z. I., Wadle, S., Akturk, I., Ros, A., Själander, M., . . . Karpuzcu, U. R. (2021). Do Not Predict – Recompute!: How Value Recomputation Can Truly Boost the Performance of Invisible Speculation. In: 2021 International Symposium on Secure and Private Execution Environment Design (SEED): . Paper presented at 2021 International Symposium on Secure and Private Execution Environment Design (SEED), Online, September 20-21, 2021 (pp. 89-100). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>Do Not Predict – Recompute!: How Value Recomputation Can Truly Boost the Performance of Invisible Speculation
Show others...
2021 (English)In: 2021 International Symposium on Secure and Private Execution Environment Design (SEED), Institute of Electrical and Electronics Engineers (IEEE), 2021, p. 89-100Conference paper, Published paper (Refereed)
Abstract [en]

Recent architectural approaches that address speculative side-channel attacks aim to prevent software from exposing the microarchitectural state changes of transient execution. The Delay-on-Miss technique is one such approach, which simply delays loads that miss in the L1 cache until they become non-speculative, resulting in no transient changes in the memory hierarchy.  However, this costs performance, prompting the use of value prediction (VP) to regain some of the delay.

However, the problem cannot be solved by simply introducing a new kind of speculation (value prediction). Value-predicted loads have to be validated, which cannot be commenced until the load becomes non-speculative. Thus, value-predicted loads occupy the same amount of precious core resources (e.g., reorder buffer entries) as Delay-on-Miss. The end result is that VP only yields marginal benefits over Delay-on-Miss.

In this paper, our insight is that we can achieve the same goal as VP (increasing performance by providing the value of loads that miss) without incurring its negative side-effect (delaying the release of precious resources), if we can safely, non-speculatively, recompute a value in isolation (without being seen from the outside), so that we do not expose any information by transferring such a value via the memory hierarchy. Value Recomputation, which trades computation for data transfer was previously proposed in an entirely different context: to reduce energy-expensive data transfers in the memory hierarchy. In this paper, we demonstrate the potential of value recomputation in relation to the Delay-on-Miss approach of hiding speculation, discuss the trade-offs, and show that we can achieve the same level of security, reaching 93% of the unsecured baseline performance (5% higher than Delay-on-miss), and exceeding (by 3%) what even an oracular (100% accuracy and coverage) value predictor could do.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2021
National Category
Computer Engineering
Identifiers
urn:nbn:se:uu:diva-453758 (URN)10.1109/SEED51797.2021.00021 (DOI)000799181700013 ()978-1-6654-2025-9 (ISBN)
Conference
2021 International Symposium on Secure and Private Execution Environment Design (SEED), Online, September 20-21, 2021
Funder
Swedish Research Council, 2015-05159Swedish Research Council, 2018-05254
Available from: 2021-09-22 Created: 2021-09-22 Last updated: 2022-06-28Bibliographically approved
Sakalis, C., Själander, M. & Kaxiras, S. (2021). Seeds of SEED: Preventing Priority Inversion in Instruction Scheduling to Disrupt Speculative Interference. In: 2021 International Symposium on Secure and Private Execution Environment Design (SEED): . Paper presented at 2021 International Symposium on Secure and Private Execution Environment Design (SEED), Online, September 20-21, 2021 (pp. 101-107). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>Seeds of SEED: Preventing Priority Inversion in Instruction Scheduling to Disrupt Speculative Interference
2021 (English)In: 2021 International Symposium on Secure and Private Execution Environment Design (SEED), Institute of Electrical and Electronics Engineers (IEEE), 2021, p. 101-107Conference paper, Published paper (Refereed)
Abstract [en]

Speculative side-channel attacks consist of two parts: The speculative instructions that abuse speculative execution to gain illegal access to sensitive data and the side-channel instructions that leak the sensitive data. Typically, the side-channel instructions are assumed to follow the speculative instructions and be dependent on them. Speculative side-channel defenses have taken advantage of these facts to construct solutions where speculative execution is limited only under the presence of these conditions, in an effort to limit the performance overheads introduced by the defense mechanisms. 

Unfortunately, it turns out that only focusing on dependent instructions enables a new set of attacks, referred to as "speculative interference attacks". These are a new variant of speculative side-channel attacks, where the side-channel instructions are placed before the point of misspeculation and hence before any illegal speculative instructions. As this breaks the previous assumptions on how speculative side-channel attacks work, this new attack variant can be used to bypass many of the existing defenses. 

We argue that the root cause of speculative interference is a priority inversion between the scheduling of older, bound to be committed, and younger, bound to be squashed instructions, which affects the execution order of the former. This priority inversion can be caused by affecting either the readiness of a not-yet-ready older instruction or the issuing priority of an older instruction after it becomes ready. We disrupt the opportunity for speculative interference by ensuring that current defenses adequately prevent the interference of younger instructions with the availability of operands to older instructions and by proposing an instruction scheduling policy to preserve the priority of ready instructions. As a proof of concept, we also demonstrate how the prevention of scheduling-priority inversion can safeguard a specific defense, Delay-on-Miss, from the possibility of speculative interference attacks. We first discuss why it is susceptible to interference attacks and how this can be corrected without introducing any additional performance costs or hardware complexity, with simple instruction scheduling rules. 

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2021
National Category
Computer Engineering
Identifiers
urn:nbn:se:uu:diva-453755 (URN)10.1109/SEED51797.2021.00022 (DOI)000799181700014 ()978-1-6654-2025-9 (ISBN)
Conference
2021 International Symposium on Secure and Private Execution Environment Design (SEED), Online, September 20-21, 2021
Funder
Swedish Research Council, 2015-05159Swedish Research Council, 2018-05254
Available from: 2021-09-22 Created: 2021-09-22 Last updated: 2022-06-28Bibliographically approved
Tran, K.-A., Sakalis, C., Själander, M., Ros, A., Kaxiras, S. & Jimborean, A. (2020). Clearing the Shadows: Recovering Lost Performance for Invisible Speculative Execution through HW/SW Co-Design. In: PACT ’20: Proceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques. Paper presented at PACT '20:International Conference on Parallel Architectures and Compilation Techniques, Virtual Event GA USA, October 3 - 7, 2020 (pp. 241-254). Association for Computing Machinery (ACM)
Open this publication in new window or tab >>Clearing the Shadows: Recovering Lost Performance for Invisible Speculative Execution through HW/SW Co-Design
Show others...
2020 (English)In: PACT ’20: Proceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques, Association for Computing Machinery (ACM) , 2020, p. 241-254Conference paper, Published paper (Refereed)
Abstract [en]

Out-of-order processors heavily rely on speculation to achieve high performance, allowing instructions to bypass other slower instructions in order to fully utilize the processor's resources. Speculatively executed instructions do not affect the correctness of the application, as they never change the architectural state, but they do affect the micro-architectural behavior of the system. Until recently, these changes were considered to be safe but with the discovery of new security attacks that misuse speculative execution to leak secrete information through observable micro-architectural changes (so called side-channels), this is no longer the case. To solve this issue, a wave of software and hardware mitigations have been proposed, the majority of which delay and/or hide speculative execution until it is deemed to be safe, trading performance for security. These newly enforced restrictions change how speculation is applied and where the performance bottlenecks appear, forcing us to rethink how we design and optimize both the hardware and the software.

We observe that many of the state-of-the-art hardware solutions targeting memory systems operate on a common scheme: the visible execution of loads or their dependents is blocked until they become safe to execute. In this work we propose a generally applicable hardware-software extension that focuses on removing the causes of loads' unsafety, generally caused by control and memory dependence speculation. As a result, we manage to make more loads safe to execute at an early stage, which enables us to schedule more loads at a time to overlap their delays and improve performance. We apply our techniques on the state-of-the-art Delay-on-Miss hardware defense and show that we reduce the performance gap to the unsafe baseline by 53% (on average).

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2020
Series
International Conference on Parallel Architectures and Compilation Techniques, ISSN 1089-795X
Keywords
speculative execution, side-channel attacks, caches, compiler, in- struction reordering, coherence protocoL
National Category
Computer Engineering
Identifiers
urn:nbn:se:uu:diva-428516 (URN)10.1145/3410463.3414640 (DOI)000723645400023 ()978-1-4503-8075-1 (ISBN)
Conference
PACT '20:International Conference on Parallel Architectures and Compilation Techniques, Virtual Event GA USA, October 3 - 7, 2020
Funder
Swedish Research Council, 2015-05159Swedish Research Council, 2016-05086Swedish Research Council, 2018-05254EU, Horizon 2020, 819134
Available from: 2020-12-14 Created: 2020-12-14 Last updated: 2021-12-21Bibliographically approved
Sakalis, C., Jimborean, A., Kaxiras, S. & Själander, M. (2020). Evaluating the Potential Applications of Quaternary Logic for Approximate Computing. ACM Journal on Emerging Technologies in Computing Systems, 16(1), Article ID 5.
Open this publication in new window or tab >>Evaluating the Potential Applications of Quaternary Logic for Approximate Computing
2020 (English)In: ACM Journal on Emerging Technologies in Computing Systems, ISSN 1550-4832, E-ISSN 1550-4840, Vol. 16, no 1, article id 5Article in journal (Refereed) Published
Abstract [en]

There exist extensive ongoing research efforts on emerging atomic-scale technologies that have the potential to become an alternative to today’s complementary metal--oxide--semiconductor technologies. A common feature among the investigated technologies is that of multi-level devices, particularly the possibility of implementing quaternary logic gates and memory cells. However, for such multi-level devices to be used reliably, an increase in energy dissipation and operation time is required. Building on the principle of approximate computing, we present a set of combinational logic circuits and memory based on multi-level logic gates in which we can trade reliability against energy efficiency. Keeping the energy and timing constraints constant, important data are encoded in a more robust binary format while error-tolerant data are encoded in a quaternary format. We analyze the behavior of the logic circuits when exposed to transient errors caused as a side effect of this encoding. We also evaluate the potential benefit of the logic circuits and memory by embedding them in a conventional computer system on which we execute jpeg, sobel, and blackscholes approximately. We demonstrate that blackscholes is not suitable for such a system and explain why. However, we also achieve dynamic energy reductions of 10% and 13% for jpeg and sobel, respectively, and improve execution time by 38% for sobel, while maintaining adequate output quality.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2020
Keywords
approximate computing, quaternary
National Category
Computer Systems
Research subject
Computer Systems Sciences
Identifiers
urn:nbn:se:uu:diva-396028 (URN)10.1145/3359620 (DOI)000535717000005 ()
Funder
Swedish Research Council, 2015-05159Swedish National Infrastructure for Computing (SNIC)
Available from: 2019-10-29 Created: 2019-10-29 Last updated: 2024-02-21Bibliographically approved
Reissmann, N., Meyer, J. C., Bahmann, H. & Själander, M. (2020). RVSDG: An Intermediate Representation for Optimizing Compilers. ACM Transactions on Embedded Computing Systems, 19(6), Article ID 49.
Open this publication in new window or tab >>RVSDG: An Intermediate Representation for Optimizing Compilers
2020 (English)In: ACM Transactions on Embedded Computing Systems, ISSN 1539-9087, E-ISSN 1558-3465, Vol. 19, no 6, article id 49Article in journal (Refereed) Published
Abstract [en]

Intermediate Representations (IRs) are central to optimizing compilers as the way the program is represented may enhance or limit analyses and transformations. Suitable IRs focus on exposing the most relevant information and establish invariants that different compiler passes can rely on. While control-flow centric IRs appear to be a natural fit for imperative programming languages, analyses required by compilers have increasingly shifted to understand data dependencies and work at multiple abstraction layers at the same time. This is partially evidenced in recent developments such as the Multi-Level Intermediate Representation (MLIR) proposed by Google. However, rigorous use of data flow centric IRs in general purpose compilers has not been evaluated for feasibility and usability as previous works provide no practical implementations.

We present the Regionalized Value State Dependence Graph (RVSDG) IR for optimizing compilers. The RVSDG is a data flow centric IR where nodes represent computations, edges represent computational dependencies, and regions capture the hierarchical structure of programs. It represents programs in demand-dependence form, implicitly supports structured control flow, and models entire programs within a single IR. We provide a complete specification of the RVSDG, construction and destruction methods, as well as exemplify its utility by presenting Dead Node and Common Node Elimination optimizations. We implemented a prototype compiler and evaluate it in terms of performance, code size, compilation time, and representational overhead. Our results indicate that the RVSDG can serve as a competitive IR in optimizing compilers while reducing complexity.

Keywords
Regionalized Value State Dependence Graph, RVSDG, LLVM, Intermediate Representation
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:uu:diva-429778 (URN)10.1145/3391902 (DOI)000595600300009 ()
Funder
Swedish Research Council, 2015-05159
Available from: 2021-01-04 Created: 2021-01-04 Last updated: 2021-04-26Bibliographically approved
Nishtala, R., Petrucci, V., Carpenter, P. & Själander, M. (2020). Twig: Multi-Agent Task Management for Colocated Latency-Critical Cloud Services. In: : . Paper presented at ACM International Conference on High-Performance Computer Architecture (HPCA). IEEE
Open this publication in new window or tab >>Twig: Multi-Agent Task Management for Colocated Latency-Critical Cloud Services
2020 (English)Conference paper, Published paper (Refereed)
Abstract [en]

Many of the important services running on data centres are latency-critical, time-varying, and demand strict user satisfaction. Stringent tail-latency targets for colocated services and increasing system complexity make it challenging to reduce the power consumption of data centres. Data centres typically sacrifice server efficiency to maintain tail-latency targets resulting in an increased total cost of ownership. This paper introduces Twig, a scalable quality-of-service (QoS) aware task manager for latency-critical services co-located on a server system. Twig successfully leverages deep reinforcement learning to characterise tail latency using hardware performance counters and to drive energy-efficient task management decisions in data centres. We evaluate Twig on a typical data centre server managing four widely used latency-critical services. Our results show that Twig outperforms prior works in reducing energy usage by up to 38% while achieving up to 99% QoS guarantee for latency-critical services.

Place, publisher, year, edition, pages
IEEE, 2020
National Category
Computer Systems
Research subject
Computer Science
Identifiers
urn:nbn:se:uu:diva-429781 (URN)10.1109/HPCA47549.2020.00023 (DOI)
Conference
ACM International Conference on High-Performance Computer Architecture (HPCA)
Funder
Swedish Research Council, 2015-05159
Available from: 2021-01-04 Created: 2021-01-04 Last updated: 2021-01-14Bibliographically approved
Sakalis, C., Kaxiras, S., Ros, A., Jimborean, A. & Själander, M. (2020). Understanding Selective Delay as a Method for Efficient Secure Speculative Execution. IEEE Transactions on Computers, 69(11), 1584-1595
Open this publication in new window or tab >>Understanding Selective Delay as a Method for Efficient Secure Speculative Execution
Show others...
2020 (English)In: IEEE Transactions on Computers, ISSN 0018-9340, E-ISSN 1557-9956, Vol. 69, no 11, p. 1584-1595Article in journal (Refereed) Published
Abstract [en]

Since the introduction of Meltdown and Spectre, the research community has been tirelessly working on speculative side-channel attacks and on how to shield computer systems from them. To ensure that a system is protected not only from all the currently known attacks but also from future, yet to be discovered, attacks, the solutions developed need to be general in nature, covering a wide array of system components, while at the same time keeping the performance, energy, area, and implementation complexity costs at a minimum. One such solution is our own delay-on-miss, which efficiently protects the memory hierarchy by i) selectively delaying speculative load instructions and ii) utilizing value prediction as an invisible form of speculation. In this article we dive deeper into delay-on-miss, offering insights into why and how it affects the performance of the system. We also reevaluate value prediction as an invisible form of speculation. Specifically, we focus on the implications that delaying memory loads has in the memory level parallelism of the system and how this affects the value predictor and the overall performance of the system. We present new, updated results but more importantly, we also offer deeper insight into why delay-on-miss works so well and what this means for the future of secure speculative execution.

Keywords
Speculative execution, side-channel attacks, memory, security
National Category
Computer Systems
Identifiers
urn:nbn:se:uu:diva-404312 (URN)10.1109/TC.2020.3014456 (DOI)000576255400003 ()
Funder
Swedish Research Council, 2015-05159Swedish Foundation for Strategic Research , SM17-0064European Regional Development Fund (ERDF), RTI2018098156-B-C53Swedish National Infrastructure for Computing (SNIC)
Available from: 2020-02-17 Created: 2020-02-17 Last updated: 2023-03-28Bibliographically approved
Sakalis, C., Alipour, M., Ros, A., Jimborean, A., Kaxiras, S. & Själander, M. (2019). Ghost Loads: What is the cost of invisible speculation?. In: Proceedings of the 16th ACM International Conference on Computing Frontiers: . Paper presented at CF 2019, April 30 – May 2, Alghero, Sardinia, Italy (pp. 153-163). New York: ACM Press
Open this publication in new window or tab >>Ghost Loads: What is the cost of invisible speculation?
Show others...
2019 (English)In: Proceedings of the 16th ACM International Conference on Computing Frontiers, New York: ACM Press, 2019, p. 153-163Conference paper, Published paper (Refereed)
Abstract [en]

Speculative execution is necessary for achieving high performance on modern general-purpose CPUs but, starting with Spectre and Meltdown, it has also been proven to cause severe security flaws. In case of a misspeculation, the architectural state is restored to assure functional correctness but a multitude of microarchitectural changes (e.g., cache updates), caused by the speculatively executed instructions, are commonly left in the system.  These changes can be used to leak sensitive information, which has led to a frantic search for solutions that can eliminate such security flaws. The contribution of this work is an evaluation of the cost of hiding speculative side-effects in the cache hierarchy, making them visible only after the speculation has been resolved. For this, we compare (for the first time) two broad approaches: i) waiting for loads to become non-speculative before issuing them to the memory system, and ii) eliminating the side-effects of speculation, a solution consisting of invisible loads (Ghost loads) and performance optimizations (Ghost Buffer and Materialization). While previous work, InvisiSpec, has proposed a similar solution to our latter approach, it has done so with only a minimal evaluation and at a significant performance cost. The detailed evaluation of our solutions shows that: i) waiting for loads to become non-speculative is no more costly than the previously proposed InvisiSpec solution, albeit much simpler, non-invasive in the memory system, and stronger security-wise; ii) hiding speculation with Ghost loads (in the context of a relaxed memory model) can be achieved at the cost of 12% performance degradation and 9% energy increase, which is significantly better that the previous state-of-the-art solution.

Place, publisher, year, edition, pages
New York: ACM Press, 2019
Keywords
speculation, security, side-channel attacks, caches
National Category
Computer Sciences
Identifiers
urn:nbn:se:uu:diva-383173 (URN)10.1145/3310273.3321558 (DOI)000474686400019 ()978-1-4503-6685-4 (ISBN)
Conference
CF 2019, April 30 – May 2, Alghero, Sardinia, Italy
Funder
Swedish Research Council, 2015-05159Swedish National Infrastructure for Computing (SNIC)
Note

Available from: 2019-05-10 Created: 2019-05-10 Last updated: 2021-10-15Bibliographically approved
Umuroglu, Y., Conficconi, D., Rasnayake, L., Preusser, T. B. & Själander, M. (2019). Optimizing Bit-Serial Matrix Multiplication for Reconfigurable Computing. ACM Transactions on Reconfigurable Technology and Systems, 12(3), Article ID 15.
Open this publication in new window or tab >>Optimizing Bit-Serial Matrix Multiplication for Reconfigurable Computing
Show others...
2019 (English)In: ACM Transactions on Reconfigurable Technology and Systems, ISSN 1936-7406, E-ISSN 1936-7414, Vol. 12, no 3, article id 15Article in journal (Refereed) Published
Abstract [en]

Matrix-matrix multiplication is a key computational kernel for numerous applications in science and engineering, with ample parallelism and data locality that lends itself well to high-performance implementations. Many matrix multiplication-dependent applications can use reduced-precision integer or fixed-point representations to increase their performance and energy efficiency while still offering adequate quality of results. However, precision requirements may vary between different application phases or depend on input data, rendering constant-precision solutions ineffective. BISMO, a vectorized bit-serial matrix multiplication overlay for reconfigurable computing, previously utilized the excellent binary-operation performance of FPGAs to offer a matrix multiplication performance that scales with required precision and parallelism. We show how BISMO can be scaled up on Xilinx FPGAs using an arithmetic architecture that better utilizes six-input LUTs. The improved BISMO achieves a peak performance of 15.4 binary TOPS on the Ultra96 board with a Xilinx UltraScale+ MPSoC.

Place, publisher, year, edition, pages
ASSOC COMPUTING MACHINERY, 2019
Keywords
Bit serial, matrix multiplication, overlay, FPGA
National Category
Embedded Systems Other Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:uu:diva-398429 (URN)10.1145/3337929 (DOI)000496751500006 ()
Funder
Swedish Research Council, 2015-05159
Available from: 2019-12-06 Created: 2019-12-06 Last updated: 2019-12-06Bibliographically approved
Tran, K.-A., Carlson, T. E., Koukos, K., Själander, M., Spiliopoulos, V., Kaxiras, S. & Jimborean, A. (2018). Static instruction scheduling for high performance on limited hardware. IEEE Transactions on Computers, 67(4), 513-527
Open this publication in new window or tab >>Static instruction scheduling for high performance on limited hardware
Show others...
2018 (English)In: IEEE Transactions on Computers, ISSN 0018-9340, E-ISSN 1557-9956, Vol. 67, no 4, p. 513-527Article in journal (Refereed) Published
Abstract [en]

Complex out-of-order (OoO) processors have been designed to overcome the restrictions of outstanding long-latency misses at the cost of increased energy consumption. Simple, limited OoO processors are a compromise in terms of energy consumption and performance, as they have fewer hardware resources to tolerate the penalties of long-latency loads. In worst case, these loads may stall the processor entirely. We present Clairvoyance, a compiler based technique that generates code able to hide memory latency and better utilize simple OoO processors. By clustering loads found across basic block boundaries, Clairvoyance overlaps the outstanding latencies to increases memory-level parallelism. We show that these simple OoO processors, equipped with the appropriate compiler support, can effectively hide long-latency loads and achieve performance improvements for memory-bound applications. To this end, Clairvoyance tackles (i) statically unknown dependencies, (ii) insufficient independent instructions, and (iii) register pressure. Clairvoyance achieves a geomean execution time improvement of 14 percent for memory-bound applications, on top of standard O3 optimizations, while maintaining compute-bound applications' high-performance.

National Category
Computer Sciences
Identifiers
urn:nbn:se:uu:diva-334011 (URN)10.1109/TC.2017.2769641 (DOI)000427420800005 ()
Projects
UPMARC
Funder
Swedish Research Council, 2016-05086
Available from: 2017-11-03 Created: 2017-11-20 Last updated: 2023-03-28Bibliographically approved
Projects
Leveraging Multi-Value Logic to Modulate Approximation for Future Nano Devices [2015-05159_VR]; Uppsala University
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0003-4232-6976

Search in DiVA

Show all publications