Logo: to the web site of Uppsala University

uu.sePublications from Uppsala University
Change search
Link to record
Permanent link

Direct link
Publications (10 of 98) Show all publications
Gomez Hernandez, E. J., Cebrian, J. M., Kaxiras, S. & Ros, A. (2025). Bounding Speculative Execution of Atomic Regions to a Single Retry. In: : . ACM Digital Library
Open this publication in new window or tab >>Bounding Speculative Execution of Atomic Regions to a Single Retry
2025 (English)Conference paper (Refereed)
Abstract [en]

Mutual exclusion has long served as a fundamental construct in parallel programs. Despite a long history of optimizing the lower-level lock and unlock operations used to enforce mutual exclusion, such operations largely dictate performance in parallel programs. Speculative Lock Elision, and more generally Hardware Transactional Memory, allow executing atomic regions (ARs) concurrently and speculatively, and ensure correctness by using conflict detection. However, practical implementations of these ideas are best-effort and, in case of conflicts, the execution of ARs is retried a predetermined number of times before falling back to mutual exclusion.This work explores the opportunities of using cacheline locking to bound the number of retries of speculative solutions. Our key insight is that ARs that access exactly the same set of addresses when re-executing can learn that set in the first execution and execute non-speculatively in the next one by performing an ordered cacheline locking. This way the speculative execution is bounded to a single retry.We first establish the conditions for ARs to be able to re-execute under a cacheline-locked mode. Based on these conditions, we propose cleAR, cacheline-locked executed AR, a novel technique that on the first abort, forces the re-execution to use cacheline locking. The detection and conversion to cacheline-locking mode is transparent to software.Using gem5 running data-structure benchmarks and the STAMP benchmark suite, we show that the average number of ARs that succeed on the first retry grows from 35.4% in our baseline to 64.4% with cleAR, reducing the percentage of fallback (coarse-grain mutual exclusion) execution from 37.2% to 15.4%. These improvements reduce average execution time by 35.0% over a baseline configuration and by 23.3% over more elaborated approaches like PowerTM.

Place, publisher, year, edition, pages
ACM Digital Library, 2025
National Category
Computer and Information Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:uu:diva-552946 (URN)10.1145/3622781.3674176 (DOI)001483831800002 ()2-s2.0-105007040952 (Scopus ID)
Available from: 2025-03-19 Created: 2025-03-19 Last updated: 2025-06-13Bibliographically approved
Asgharzadeh, A., Feliu, J., Acacio, M. E., Kaxiras, S. & Ros, A. (2025). No Rush in Executing Atomic Instructions. In: 2025 IEEE International Symposium on High Performance Computer Architecture (HPCA): . Paper presented at 2025 International Symposium on High Performance Computer Architecture - HPCA-Annual, March 1-5, 2025, Las Vegas, NV, USA (pp. 1618-1630). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>No Rush in Executing Atomic Instructions
Show others...
2025 (English)In: 2025 IEEE International Symposium on High Performance Computer Architecture (HPCA), Institute of Electrical and Electronics Engineers (IEEE), 2025, p. 1618-1630Conference paper, Published paper (Refereed)
Abstract [en]

Hardware atomic instructions are the building blocks of the synchronization algorithms. Historically, to guarantee atomicity and consistency, they were implemented using memory fences, committing older memory instructions, and draining the store buffer before initiating the execution of atomics. Unfortunately, the use of such memory fences entails huge performance penalties as it implies execution serialization, thus impeding instruction- and memory-level parallelism.

The situation, however, seems to have changed recently. Through experiments on x86 machines, we discovered that current x86 processors manage to comply with the x86-TSO requirements while avoiding the performance overhead introduced by fences (fence-free or unfenced implementation). This paves the way to new potential optimizations to atomic instruction execution. In particular, our simulation experiments modeling unfenced atomics reveal that executing atomic instructions as soon as their operands are ready does not always lead to optimal performance. In fact, this increases the time that other threads should wait to obtain the cacheline. In contended scenarios, delaying the execution of the atomic instruction to minimize the time the cacheline is locked provides superior performance.

Based on this observation, we present Rush or Wait (RoW), a hardware mechanism to decide when to execute an atomic instruction. The mechanism is based on a contention predictor that estimates if an atomic will access a contended cacheline. Non-contended atomics execute once their operands are ready. Contended atomics, on the contrary, wait to become the oldest memory instruction and to drain the store buffer to execute, minimizing the contention on the accessed cacheline. Our experimental evaluation shows that RoW reduces execution time on average by 9.2% (and up to 43%) compared to a baseline that executes atomics as soon as the operands are ready, and yet it requires a small area overhead (64 bytes).

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2025
Series
International Symposium on High-Performance Computer Architecture - Proceedings, ISSN 1530-0897, E-ISSN 2378-203X
National Category
Computer Systems Computer Sciences Computer Engineering
Identifiers
urn:nbn:se:uu:diva-565455 (URN)10.1109/HPCA61900.2025.00120 (DOI)001494383800112 ()2-s2.0-105003405961 (Scopus ID)979-8-3315-0648-3 (ISBN)979-8-3315-0647-6 (ISBN)
Conference
2025 International Symposium on High Performance Computer Architecture - HPCA-Annual, March 1-5, 2025, Las Vegas, NV, USA
Funder
EU, Horizon 2020, 819134Swedish Research Council, 2022-04959Swedish Foundation for Strategic Research, FUS21-0067
Available from: 2025-08-21 Created: 2025-08-21 Last updated: 2025-08-21Bibliographically approved
Ekemark, P., Ros, A., Sagonas, K. & Kaxiras, S. (2024). A First Exploration of Fine-Grain Coherence for Integrity Metadata. In: 2024 INTERNATIONAL SYMPOSIUM ON SECURE AND PRIVATE EXECUTION ENVIRONMENT DESIGN, SEED 2024: . Paper presented at 2024 International Symposium on Secure and Private Execution Environment Design, MAY 16-17, 2024, Orlando, FL (pp. 62-72). IEEE Computer Society
Open this publication in new window or tab >>A First Exploration of Fine-Grain Coherence for Integrity Metadata
2024 (English)In: 2024 INTERNATIONAL SYMPOSIUM ON SECURE AND PRIVATE EXECUTION ENVIRONMENT DESIGN, SEED 2024, IEEE Computer Society, 2024, p. 62-72Conference paper, Published paper (Refereed)
Abstract [en]

Memory integrity protection is intended for secure execution, and it is typically associated with programs running on a single core. However, with the emergence of multi-processor systems-on-chip and chiplets, extending memory integrity protection to cache-coherent multiprocessors becomes essential. In this work, we explore for the first time the design space for maintaining coherence in fine-grain integrity metadata at the block level. We discuss various policies for updating the integrity tree using the underlying coherence protocol, and examine how these policies affect coherence traffic. We introduce the concepts of proactive and reactive update initiation, and discuss their implications for data and integrity-tree blocks. We also investigate the trade-offs between eager and lazy update propagation policies, focusing on coherence transactions such as invalidations and downgrades to analyse the pros and cons of different approaches. What we observe is that for some benchmarks the choice between the eager and the lazy update initiation policy does not make much difference, while for many other benchmarks one policy is better over the other, depending on how the benchmark shares its data.

Place, publisher, year, edition, pages
IEEE Computer Society, 2024
National Category
Computer Sciences
Identifiers
urn:nbn:se:uu:diva-552580 (URN)10.1109/SEED61283.2024.00017 (DOI)001414915000007 ()2-s2.0-85210840060 (Scopus ID)979-8-3315-0566-0 (ISBN)979-8-3315-0565-3 (ISBN)
Conference
2024 International Symposium on Secure and Private Execution Environment Design, MAY 16-17, 2024, Orlando, FL
Available from: 2025-03-18 Created: 2025-03-18 Last updated: 2025-03-18Bibliographically approved
Asgharzadeh, A., Gómez-Hernández, E. J., Cebrian, J. M., Kaxiras, S. & Ros, A. (2024). Hardware Cache Locking for All Memory Updates. In: 2024 IEEE 42nd International Conference on Computer Design (ICCD): . Paper presented at 42nd International Conference on Computer Design, Nov 18-20, 2024, Milan, Italy (pp. 566-574). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>Hardware Cache Locking for All Memory Updates
Show others...
2024 (English)In: 2024 IEEE 42nd International Conference on Computer Design (ICCD), Institute of Electrical and Electronics Engineers (IEEE), 2024, p. 566-574Conference paper, Published paper (Refereed)
Abstract [en]

Many applications need to perform operations that involve reading a value from memory, modifying it, and then writing it back. Multiple architectures provide hardware support for these operations via read-modify-write (RMW) instructions. The primary benefit is that the read can request a cacheline with write permissions, reducing coherence protocol overhead since the write will find the cacheline with appropriate permissions. RMWs can be either atomic or non-atomic. Atomic RMWs, used for synchronization, commonly require (i) locking the cacheline to guarantee atomicity by preventing invalidations and (ii) enforcing serialization of instructions in the program (e.g., via memory fences), which may cause performance degradation based on the implemented memory consistency model. Non-atomic RMWs, while not requiring such strict measures, should only be used in data-race free code sections. However, other cores may invalidate a cacheline during a non-atomic RMW (e.g., due to false sharing), flushing the pipeline and causing the loss of write permissions obtained by the read, which is detrimental to performance.

In this work, we propose a microarchitectural mechanism that enables non-atomic RMWs to fetch the cacheline locking it, thus preventing other cores from "stealing" the cacheline while allowing them to run concurrently with other instructions in the same core. Our proposal enables concurrent hardware cache locking for multiple non-atomic RMWs while guaranteeing deadlock freedom and no programmer/compiler intervention. We also propose a lock-chaining mechanism to allow multiple consecutive memory updates to the same cacheline up to a predefined maximum (to prevent starvation and load imbalance). Our evaluation using gem5 full-system simulator shows that for an eight-core configuration, our proposal improves performance by up to 5.36% (2.05% on average), requiring just 45 bytes of storage per core.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2024
Series
Proceedings IEEE International Conference on Computer Design, ISSN 1063-6404, E-ISSN 2576-6996
Keywords
Multi-core architectures, micro-architecture, non-atomic Read-Modify-Write, false sharing, hardware cache, locking.
National Category
Computer Sciences Computer Engineering Computer Systems
Identifiers
urn:nbn:se:uu:diva-554739 (URN)10.1109/ICCD63220.2024.00092 (DOI)001441178200079 ()2-s2.0-85217070730 (Scopus ID)979-8-3503-8041-5 (ISBN)979-8-3503-8040-8 (ISBN)
Conference
42nd International Conference on Computer Design, Nov 18-20, 2024, Milan, Italy
Funder
EU, Horizon 2020, 819134European Regional Development Fund (ERDF), PID2022-136315OB-I00EU, European Research Council, TED2021-130233BC33
Available from: 2025-04-16 Created: 2025-04-16 Last updated: 2025-04-16Bibliographically approved
Aimoniotis, P. & Kaxiras, S. (2024). JANUS: A Simple and Efficient Speculative Defense using Reinforcement Learning. In: 2024 IEEE 36th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD): . Paper presented at 2024 IEEE 36th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), 13-15 November 2024, Hilo, HI, USA (pp. 25-36). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>JANUS: A Simple and Efficient Speculative Defense using Reinforcement Learning
2024 (English)In: 2024 IEEE 36th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), Institute of Electrical and Electronics Engineers (IEEE), 2024, p. 25-36Conference paper, Published paper (Refereed)
Abstract [en]

Speculative execution and the emergence of Spectre attacks have forced architects to rethink how microprocessors are designed. Several approaches aim to close this security vulnerability while trying to minimize performance degradation, often involving complex and sophisticated mechanisms. These strategies typically entail substantial modifications to the processor core and memory hierarchy, which ultimately inhibit their adoption in real designs.In this work, we leverage two of the simplest speculative defense ideas, NDA and DoM, that can co-exist in the same core, and we apply a simple form of Reinforcement Learning (RL) to select the most effective mechanism, as the underlying processor defense, for a window of execution. NDA forbids the propagation of a potential secret to subsequent instructions while DoM prohibits the creation of observable timing differences in the cache. We observe that their impact on different applications can vary significantly, but, often, they can complement each other within the same application. However, our investigation also reveals vulnerabilities in previous proposals that try to combine these secure speculation schemes into one. We demonstrate an attack scenario that violates the security of the combined scheme and we present the conditions that must hold to safely combine them. Lastly, while the cost and complexity of reinforcement learning may seem inordinately high for microarchitectural implementations, we build on recent research that demonstrates remarkably lightweight solutions, provided that the action space is small.We present JANUS, a lightweight architecture leveraging an RL agent based on a two-armed bandit algorithm. JANUS selects the optimal, performance-wise, defense mechanism that protects the processor within a specific time window. We evaluate JANUS on SPEC2017 benchmark suite and find that it outperforms NDA by +4.9%, STT (a more sophisticated and complex scheme that uses taint tracking) by +1%, and DoM by +2.6%. Further, when a state-of-the-art address-prediction optimization (Doppelganger Loads) is employed on top of the baseline defenses, NDA and DoM, JANUS still outperforms the former by +2.3%, and the latter by +0.3%. When evaluated with the older SPEC2006 benchmark suite, JANUS outperforms all schemes by +4.7% on average, with a maximum of +8.2% over DoM. JANUS achieves these results with a meager storage overhead of just 16 bytes and a complexity-effective design.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2024
Series
Proceedings Symposium on Computer Architecture and High Performance Computing, ISSN 1550-6533, E-ISSN 2643-3001
Keywords
Speculative side-channels, spectre, reinforcement learning
National Category
Computer Sciences Computer Systems
Identifiers
urn:nbn:se:uu:diva-553087 (URN)10.1109/SBAC-PAD63648.2024.00011 (DOI)2-s2.0-85212436359 (Scopus ID)979-8-3503-5616-8 (ISBN)979-8-3503-5617-5 (ISBN)
Conference
2024 IEEE 36th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), 13-15 November 2024, Hilo, HI, USA
Funder
Swedish Foundation for Strategic Research, FUS21-0067National Academic Infrastructure for Supercomputing in Sweden (NAISS), 2023/22-3UPPMAXSwedish Research Council, 2022-06725
Available from: 2025-03-22 Created: 2025-03-22 Last updated: 2025-09-10Bibliographically approved
Song, W., Kaxiras, S., Voigt, T., Yao, Y. & Mottola, L. (2024). TaDA: Task Decoupling Architecture for the Battery-less Internet of Things. In: Jie Liu; Yuanchao Shu; Jiming Chen; Yuan He; Rui Tan (Ed.), SenSys '24: Proceedings of the 22nd ACM Conference on Embedded Networked Sensor Systems. Paper presented at 22nd Conference on Embedded Networked Sensor Systems, Nov 4-7, 2024, Hangzhou, Peoples Republic of China (pp. 409-421). Association for Computing Machinery (ACM)
Open this publication in new window or tab >>TaDA: Task Decoupling Architecture for the Battery-less Internet of Things
Show others...
2024 (English)In: SenSys '24: Proceedings of the 22nd ACM Conference on Embedded Networked Sensor Systems / [ed] Jie Liu; Yuanchao Shu; Jiming Chen; Yuan He; Rui Tan, Association for Computing Machinery (ACM), 2024, p. 409-421Conference paper, Published paper (Refereed)
Abstract [en]

We present TaDA, a system architecture enabling efficient execution of Internet of Things (IoT) applications across multiple computing units, powered by ambient energy harvesting. Low-power microcontroller units (MCUs) are increasingly specialized; for example, custom designs feature hardware acceleration of neural network inference, next to designs providing energy-efficient input/output. As application requirements are growingly diverse, we argue that no single MCU can efficiently fulfill them. TaDA allows programmers to assign the execution of different slices of the application logic to the most efficient MCU for the job. We achieve this by decoupling task executions in time and space, using a special-purpose hardware interconnect we design, while providing persistent storage to cross periods of energy unavailability. We compare our prototype performance against the single most efficient computing unit for a given workload. We show that our prototype saves up to 96.7% energy per application round. Given the same energy budget, this yields up to a 68.7x throughput improvement.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2024
Keywords
Task decoupling, Internet of Things (IoT), energy harvesting, intermittent computing
National Category
Computer Systems Computer Engineering
Identifiers
urn:nbn:se:uu:diva-557889 (URN)10.1145/3666025.3699347 (DOI)001436544300030 ()2-s2.0-85211759485 (Scopus ID)979-8-4007-0697-4 (ISBN)
Conference
22nd Conference on Embedded Networked Sensor Systems, Nov 4-7, 2024, Hangzhou, Peoples Republic of China
Funder
Swedish Foundation for Strategic ResearchEU, European Research Council
Available from: 2025-06-03 Created: 2025-06-03 Last updated: 2025-06-03Bibliographically approved
Kvalsvik, A. B., Aimoniotis, P., Kaxiras, S. & Själander, M. (2023). Doppelganger Loads: A Safe, Complexity-Effective Optimization for Secure Speculation Schemes. In: ISCA '23: Proceedings of the 50th Annual International Symposium on Computer Architecture: . Paper presented at 50th Annual International Symposium on Computer Architecture (ISCA), JUN 17-21, 2023, Orlando, FL, USA. New York, NY: Association for Computing Machinery (ACM), Article ID 53.
Open this publication in new window or tab >>Doppelganger Loads: A Safe, Complexity-Effective Optimization for Secure Speculation Schemes
2023 (English)In: ISCA '23: Proceedings of the 50th Annual International Symposium on Computer Architecture, New York, NY: Association for Computing Machinery (ACM), 2023, article id 53Conference paper, Published paper (Refereed)
Abstract [en]

Speculative side-channel attacks have forced computer architects to rethink speculative execution. Effectively preventing microarchitectural state from leaking sensitive information will be a key requirement in future processor design.

An important limitation of many secure speculation schemes is a reduction in the available memory parallelism, as unsafe loads (depending on the particular scheme) are blocked, as they might potentially leak information. Our contribution is to show that it is possible to recover some of this lost memory parallelism, by safely predicting the addresses of these loads in a threat-model transparent way, i.e., without worsening the security guarantees of the underlying secure scheme. To demonstrate the generality of the approach, we apply it to three different secure speculation schemes: Non-speculative Data Access (NDA), Speculative Taint Tracking (STT), and Delay-on-Miss (DoM).

An address predictor is trained on non-speculative data, and can afterwards predict the addresses of unsafe slow-to-issue loads, preloading the target registers with speculative values, that can be released faster on correct predictions than starting the entire load process. This new perspective on speculative execution encompasses all loads, and gives speedups, separately from prefetching.

We call the address-predicted counterparts of loads Doppelganger Loads. They give notable performance improvements for the three secure speculation schemes we evaluate, NDA, STT, and DoM. The Doppelganger Loads reduce the geometric mean slowdown by 42%, 48%, and 30% respectively, as compared to an unsafe baseline, for a wide variety of SPEC2006 and SPEC2017 benchmarks. Furthermore, Doppelganger Loads can be efficiently implemented with only minor core modifications, reusing existing resources such as a stride prefetcher, and most importantly, requiring no changes to the memory hierarchy outside the core.

Place, publisher, year, edition, pages
New York, NY: Association for Computing Machinery (ACM), 2023
Series
Conference Proceedings Annual International Symposium on Computer Architecture, ISSN 1063-6897
Keywords
computer architecture, security, speculative side-channels, spectre
National Category
Computer Systems
Identifiers
urn:nbn:se:uu:diva-509800 (URN)10.1145/3579371.3589088 (DOI)001098723900053 ()979-8-4007-0095-8 (ISBN)
Conference
50th Annual International Symposium on Computer Architecture (ISCA), JUN 17-21, 2023, Orlando, FL, USA
Funder
Vinnova, 2021-02422Swedish Research Council, 2018-05254Swedish Research Council, 2022-04959Uppsala UniversitySwedish Foundation for Strategic Research, FUS21-0067
Available from: 2023-08-22 Created: 2023-08-22 Last updated: 2025-09-10Bibliographically approved
Chen, X., Aimoniotis, P. & Kaxiras, S. (2023). How addresses are made. In: 2023 IEEE International ymposium on Workload Characterization, IISWC: . Paper presented at 26th IEEE International Symposium on Workload Characterization (IISWC), OCT 01-03, 2023, Gent, Belgium (pp. 223-225). IEEE
Open this publication in new window or tab >>How addresses are made
2023 (English)In: 2023 IEEE International ymposium on Workload Characterization, IISWC, IEEE, 2023, p. 223-225Conference paper, Published paper (Refereed)
Abstract [en]

This work uses Dynamic Information Flow Tracking (DIFT) to characterize how memory addresses are made by studying the transformation of data values into memory addresses. We show that in SPEC CPU 2017 benchmarks, a high proportion of values in memory are transformed into memory addresses. The majority of the transformations are done directly without explicit arithmetic instructions. Most of the addresses are made from one or more loaded values.

Place, publisher, year, edition, pages
IEEE, 2023
Series
International Symposium on Workload Characterization Proceedings
National Category
Computer Engineering
Identifiers
urn:nbn:se:uu:diva-523358 (URN)10.1109/IISWC59245.2023.00031 (DOI)001103166400023 ()979-8-3503-0317-9 (ISBN)979-8-3503-0318-6 (ISBN)
Conference
26th IEEE International Symposium on Workload Characterization (IISWC), OCT 01-03, 2023, Gent, Belgium
Funder
Swedish Research Council, 2018-05254Vinnova, 2021-02422Swedish Foundation for Strategic Research, FUS21-0067Swedish Research Council, NAISS 2023/22-203Swedish Research Council, 2022-06725
Available from: 2024-02-19 Created: 2024-02-19 Last updated: 2024-02-19Bibliographically approved
Aimoniotis, P., Kvalsvik, A. B., Chen, X., Själander, M. & Kaxiras, S. (2023). ReCon: Efficient Detection, Management, and Use of Non-Speculative Information Leakage. In: 56th IEEE/ACM International Symposium on Microarchitecture, MICRO 2023: . Paper presented at 56th IEEE/ACM International Symposium on Microarchitecture (MICRO), OCT 28-NOV 01, 2023, Toronto, CANADA (pp. 828-842). Association for Computing Machinery (ACM)
Open this publication in new window or tab >>ReCon: Efficient Detection, Management, and Use of Non-Speculative Information Leakage
Show others...
2023 (English)In: 56th IEEE/ACM International Symposium on Microarchitecture, MICRO 2023, Association for Computing Machinery (ACM), 2023, p. 828-842Conference paper, Published paper (Refereed)
Abstract [en]

In a speculative side-channel attack, a secret is improperly accessed and then leaked by passing it to a transmitter instruction. Several proposed defenses effectively close this security hole by either delaying the secret from being loaded or propagated, or by delaying dependent transmitters (e.g., loads) from executing when fed with tainted input derived from an earlier speculative load. This results in a loss of memory-level parallelism and performance. A security definition proposed recently, in which data already leaked in non-speculative execution need not be considered secret during speculative execution, can provide a solution to the loss of performance. However, detecting and tracking non-speculative leakage carries its own cost, increasing complexity. The key insight of our work that enables us to exploit non-speculative leakage as an optimization to other secure speculation schemes is that the majority of non-speculative leakage is simply due to pointer dereferencing (or base-address indexing) - essentially what many secure speculation schemes prevent from taking place speculatively. We present ReCon that: i) efficiently detects non-speculative leakage by limiting detection to pairs of directly-dependent loads that dereference pointers (or index a base-address); and ii) piggybacks non-speculative leakage information on the coherence protocol. In ReCon, the coherence protocol remembers and propagates the knowledge of what has leaked and therefore what is safe to dereference under speculation. To demonstrate the effectiveness of ReCon, we show how two state-of-the-art secure speculation schemes, Non-speculative Data Access (NDA) and speculative Taint Tracking (STT), leverage this information to enable more memorylevel parallelism both in a single core scenario and in a multicore scenario: NDA with ReCon reduces the performance loss by 28.7% for SPEC2017, 31.5% for SPEC2006, and 46.7% for PARSEC; STT with ReCon reduces the loss by 45.1%, 39%, and 78.6%, respectively.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2023
Keywords
Speculation, side-channels, load pair, non-speculative leakage
National Category
Computer Sciences
Identifiers
urn:nbn:se:uu:diva-525488 (URN)10.1145/3613424.3623770 (DOI)001164081800058 ()979-8-4007-0329-4 (ISBN)
Conference
56th IEEE/ACM International Symposium on Microarchitecture (MICRO), OCT 28-NOV 01, 2023, Toronto, CANADA
Funder
Vinnova, 2021-02422Swedish Research Council, 2018-05254Swedish Foundation for Strategic Research, FUS21-0067Swedish Research Council, 2022-06725
Available from: 2024-03-25 Created: 2024-03-25 Last updated: 2025-09-10Bibliographically approved
Song, W., Kaxiras, S., Mottola, L., Voigt, T. & Yao, Y. (2023). Silent Stores in the Battery-less Internet of Things: A Good Idea?. In: : . Paper presented at International Conference on Embedded Wireless Systems and Networks.
Open this publication in new window or tab >>Silent Stores in the Battery-less Internet of Things: A Good Idea?
Show others...
2023 (English)Conference paper, Published paper (Refereed)
National Category
Embedded Systems
Identifiers
urn:nbn:se:uu:diva-509586 (URN)
Conference
International Conference on Embedded Wireless Systems and Networks
Available from: 2023-08-21 Created: 2023-08-21 Last updated: 2023-08-21
Projects
Interval-Based Approach to Power Modeling in Multicores [2010-04741_VR]; Uppsala UniversityEfficient Modeling of Heterogeneity in the Era of Dark Silicon [2012-05332_VR]; Uppsala UniversityEnabling Near Data Processing for Emerging Workloads [2018-05254_VR]; Uppsala UniversityDon’t hack my memory: Towards efficient, ubiquitous memory protection [2022-04959_VR]; Uppsala University; Publications
Asgharzadeh, A., Feliu, J., Acacio, M. E., Kaxiras, S. & Ros, A. (2025). No Rush in Executing Atomic Instructions. In: 2025 IEEE International Symposium on High Performance Computer Architecture (HPCA): . Paper presented at 2025 International Symposium on High Performance Computer Architecture - HPCA-Annual, March 1-5, 2025, Las Vegas, NV, USA (pp. 1618-1630). Institute of Electrical and Electronics Engineers (IEEE)
Mitigating Side-Channel Attacks: Foundations and Applications [2023-05242_VR]; Uppsala University
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0001-8267-0232

Search in DiVA

Show all publications