Logo: to the web site of Uppsala University

uu.sePublications from Uppsala University
Change search
Refine search result
1234567 1 - 50 of 345
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1.
    Abdulla, Parosh Aziz
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Atig, Mohamed Faouzi
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Kaxiras, Stefanos
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Leonardsson, Carl
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Ros, Alberto
    Zhu, Yunyun
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Fencing programs with self-invalidation and self-downgrade2016In: Formal Techniques for Distributed Objects, Components, and Systems, Springer, 2016, p. 19-35Conference paper (Refereed)
  • 2.
    Abdulla, Parosh Aziz
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Computer Systems.
    Atig, Mohamed Faouzi
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Computer Systems.
    Kaxiras, Stefanos
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Computer Systems.
    Leonardsson, Carl
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Computer Systems.
    Ros, Alberto
    Zhu, Yunyun
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Computer Systems.
    Mending fences with self-invalidation and self-downgrade2018In: Logical Methods in Computer Science, ISSN 1860-5974, E-ISSN 1860-5974, Vol. 14, no 1, article id 6Article in journal (Refereed)
  • 3. Ahmed, Saad
    et al.
    Ul Ain, Qurat
    Siddiqui, Junaid Haroon
    Mottola, Luca
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication. Politecnico di Milano, Italy; RI.SE Sweden.
    Alizai, Muhammad Hamad
    Intermittent Computing with Dynamic Voltage and Frequency Scaling2020In: EWSN '20: Proceedings of the 2020 International Conference on Embedded Wireless Systems and Networks / [ed] Christine Julien, Fabrice Valois, Omprakash Gnawali & Amy L. Murphy, 2020, p. 97-107Conference paper (Refereed)
    Abstract [en]

    We present D2VFS, a run-time technique to intelligently regulate supply voltage and accordingly reconfigure clock frequency of intermittently-computing devices. These devices rely on energy harvesting to power their operation and on small capacitors as energy buffer. Statically setting their clock frequency fails to achieve energy efficiency, as the setting remains oblivious of fluctuations in capacitor voltage and of their impact on a microcontroller operating range. In contrast, D2VFS captures these dynamics and places the microcontroller in the most efficient configuration by regulating the microcontroller supply voltage and changing its clock frequency. Our evaluation shows that D2VFS markedly increases energy efficiency; for example, ultimately enabling a 30-300% reduction of workload completion times.

  • 4.
    Aimoniotis, Pavlos
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Computer Systems. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Kvalsvik, Amund Bergland
    Norwegian University of Science and Technology (NTNU).
    Själander, Magnus
    Norwegian University of Science and Technology (NTNU).
    Kaxiras, Stefanos
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Computer Systems. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Data-Out Instruction-In (DOIN!): Leveraging Inclusive Caches to Attack Speculative Delay Schemes2022In: 2022 IEEE International Symposium on Secure and Private Execution Environment Design (SEED 2022), Institute of Electrical and Electronics Engineers (IEEE), 2022, p. 49-60Conference paper (Refereed)
    Abstract [en]

    Although the cache has been a known side-channel for years, it has gained renewed notoriety with the introduction of speculative side-channel attacks such as Spectre, which were able to use caches to not just observe a victim, but to leak secrets. Because the cache continues to be one of the most exploitable side channels, it is often the primary target to safeguard in secure speculative execution schemes. One of the simpler secure speculation approaches is to delay speculative accesses whose effect can be observed until they become non-speculative. Delay-on-Miss, for example, delays all observable speculative loads, i.e., the ones that miss in the cache, and preserves the majority of the performance of the baseline (unsafe speculation) by executing speculative loads that hit in the cache, which were thought to be unobservable.

    However, previous work has failed to consider how instruction fetching can eject cache lines from the shared, lower level caches, and thus from higher cache levels due to inclusivity. In this work, we show how cache conflicts between instruction fetch and data accesses can extend previous attacks and present the following new insights:

    1. It is possible to use lower level caches to perform Prime+Probe through conflicts resulting from instruction fetching. This is an extension to previous Prime+Probe attacks that potentially avoids other developed mitigation strategies.

    2. Data-instruction conflicts can be used to perform a Spectre attack that breaks Delay-on-Miss. After acquiring a secret, secret-dependent instruction fetching can cause cache conflicts that result in evictions in the L1D cache, creating observable timing differences. Essentially, it is possible to leak a secret bit-by-bit through the cache, despite Delay-on-Miss defending against caches.

    We call our new attack Data-Out Instruction-In, DOIN!, and demonstrate it on a real commercial core, the AMD Ryzen 9. We demonstrate how DOIN! interacts with Delay-on-Miss and perform an analysis of noise and bandwidth. Furthermore, we propose a simple defense extension for Delay-on-Miss to maintain its security guarantees, at the cost of negligible performance degradation while executing the Spec06 workloads.

  • 5.
    Aimoniotis, Pavlos
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Sakalis, Christos
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Sjalander, Magnus
    Norwegian Univ Sci & Technol, N-7491 Trondheim, Norway..
    Kaxiras, Stefanos
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Computer Systems. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Reorder Buffer Contention: A Forward Speculative Interference Attack for Speculation Invariant Instructions2021In: IEEE COMPUTER ARCHITECTURE LETTERS, ISSN 1556-6056, Vol. 20, no 2, p. 162-165Article in journal (Refereed)
    Abstract [en]

    Speculative side-channel attacks access sensitive data and use transmitters to leak the data during wrong-path execution. Various defenses have been proposed to prevent such information leakage. However, not all speculatively executed instructions are unsafe: Recent work demonstrates that speculation invariantinstructions are independent of speculative control-flow paths and are guaranteed to eventually commit, regardless of the speculation outcome. Compile-time information coupled with run-time mechanisms can then selectively lift defenses for speculation invariant instructions, reclaiming some of the lost performance. Unfortunately, speculation invariant instructions can easily be manipulated by a form of speculative interference to leak information via a new side-channel that we introduce in this paper. We show that forward speculative interference where older speculative instructions interfere with younger speculation invariant instructions effectively turns them into transmitters for secret data accessed during speculation. We demonstrate forward speculative interference on actual hardware, by selectively filling the reorder buffer (ROB) with instructions, pushing speculative invariant instructions in-or-out of the ROB on demand, based on a speculatively accessed secret. This reveals the speculatively accessed secret, as the occupancy of the ROB itself becomes a new speculative side-channel.

  • 6.
    Alipour, Mehdi
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Computer Systems.
    Rethinking Dynamic Instruction Scheduling and Retirement for Efficient Microarchitectures2020Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    Out-of-order execution is one of the main micro-architectural techniques used to improve the performance of both single- and multi-threaded processors. The application of such a processor varies from mobile devices to server computers. This technique achieves higher performance by finding independent instructions and hiding execution latency and uses the cycles which otherwise would be wasted or caused a CPU stall. To accomplish this, it uses scheduling resources including the ROB, IQ, LSQ and physical registers, to store and prioritize instructions.

    The pipeline of an out-of-order processor has three macro-stages: the front-end, the scheduler, and the back-end. The front-end fetches instructions, places them in the out-of-order resources, and analyzes them to prepare for their execution. The scheduler identifies which instructions are ready for execution and prioritizes them for scheduling. The back-end updates the processor state with the results of the oldest completed instructions, deallocates the resources and commits the instructions in the program order to maintain correct execution.

    Since out-of-order execution needs to be able to choose any available instructions for execution, its scheduling resources must have complex circuits for identifying and prioritizing instructions, which makes them very expansive, therefore, limited. Due to their cost, the scheduling resources are constrained in size. This limited size leads to two stall points respectively at the front-end and the back-end of the pipeline. The front-end can stall due to fully allocated resources and therefore no more new instructions can be placed in the scheduler. The back-end can stall due to the unfinished execution of an instruction at the head of the ROB which prevents other resources from being deallocated, preventing new instructions from being inserted into the pipeline.

    To address these two stalls, this thesis focuses on reducing the time instructions occupy the scheduling resources. Our front-end technique tackles IQ pressure while our back-end approach considers the rest of the resources. To reduce front-end stalls we reduce the pressure on the IQ for both storing (depth) and issuing (width) instructions by bypassing them to cheaper storage structures. To reduce back-end stalls, we explore how we can retire instructions earlier, and out-of-order, to reduce the pressure on the out-of-order resource.

    List of papers
    1. A Taxonomy of Out-of-Order Instruction Commit
    Open this publication in new window or tab >>A Taxonomy of Out-of-Order Instruction Commit
    2017 (English)In: 2017 Ieee International Symposium On Performance Analysis Of Systems And Software (Ispass), Los Alamitos: IEEE Computer Society, 2017, p. 135-136Conference paper, Published paper (Refereed)
    Abstract [en]

    While in-order instruction commit has its advantages, such as providing precise interrupts and avoiding complications with the memory consistency model, it requires the core to hold on to resources (reorder buffer entries, load/store queue entries, registers) until they are released in program order. In contrast, out-of-order commit releases resources much earlier, yielding improved performance without the need for additional hardware resources. In this paper, we revisit out-of-order commit from a different perspective, not by proposing another hardware technique, but by introducing a taxonomy and evaluating three different micro-architectures that have this technique enabled. We show how smaller processors can benefit from simple out-oforder commit strategies, but that larger, aggressive cores require more aggressive strategies to improve performance.

    Place, publisher, year, edition, pages
    Los Alamitos: IEEE Computer Society, 2017
    National Category
    Computer Systems Computer Sciences
    Identifiers
    urn:nbn:se:uu:diva-352938 (URN)10.1109/ISPASS.2017.7975283 (DOI)000426905600020 ()978-1-5386-3890-3 (ISBN)978-1-5386-3891-0 (ISBN)978-1-5386-3889-7 (ISBN)
    Conference
    2017 Ieee International Symposium On Performance Analysis Of Systems And Software (Ispass), Santa Rosa, CA, USA.
    Available from: 2018-06-12 Created: 2018-06-12 Last updated: 2020-02-02Bibliographically approved
    2. Exploring the performance limits of out-of-order commit
    Open this publication in new window or tab >>Exploring the performance limits of out-of-order commit
    2017 (English)In: Proc. 14th Computing Frontiers Conference, New York: ACM Press, 2017, p. 211-220Conference paper, Published paper (Refereed)
    Abstract [en]

    Out-of-order execution is essential for high performance, general-purpose computation, as it can find and execute useful work instead of stalling. However, it is limited by the requirement of visibly sequential, atomic instruction execution --- in other words in-order instruction commit. While in-order commit has its advantages, such as providing precise interrupts and avoiding complications with the memory consistency model, it requires the core to hold on to resources (reorder buffer entries, load/store queue entries, registers) until they are released in program order. In contrast, out-of-order commit releases resources much earlier, yielding improved performance with fewer traditional hardware resources. However, out-of-order commit is limited in terms of correctness by the conditions described in the work of Bell and Lipasti. In this paper we revisit out-of-order commit from a different perspective, not by proposing another hardware technique, but by examining these conditions one by one and in combination with respect to their potential performance benefit for both non-speculative and speculative out-of-order commit. While correctly handling recovery for all out-of-order commit conditions currently requires complex tracking and expensive checkpointing, this work aims to demonstrate the potential for selective, speculative out-of-order commit using an oracle implementation without speculative rollback costs. We learn that: a) there is significant untapped potential for aggressive variants of out-of-order commit; b) it is important to optimize the commit depth, or the search distance for out-of-order commit, for a balanced design: smaller cores can benefit from shorter depths while larger cores continue to benefit from aggressive parameters; c) the focus on a subset of out-of-order commit conditions could lead to efficient implementations; d) the benefits for out-of-order commit increase with higher memory latency and works well in conjunction with prefetching to continue to improve performance.

    Place, publisher, year, edition, pages
    New York: ACM Press, 2017
    National Category
    Computer Sciences
    Identifiers
    urn:nbn:se:uu:diva-334601 (URN)10.1145/3075564.3075581 (DOI)000626242600024 ()978-1-4503-4487-6 (ISBN)
    Conference
    CF 2017, May 15–17, Siena, Italy
    Projects
    UPMARC
    Available from: 2017-05-15 Created: 2017-11-24 Last updated: 2024-01-23Bibliographically approved
    3. Maximizing limited resources: A limit-based study and taxonomy of out-of-order commit
    Open this publication in new window or tab >>Maximizing limited resources: A limit-based study and taxonomy of out-of-order commit
    2019 (English)In: Journal of Signal Processing Systems, ISSN 1939-8018, E-ISSN 1939-8115, Vol. 91, no 3-4, p. 379-397Article in journal (Refereed) Published
    Abstract [en]

    Out-of-order execution is essential for high performance, general-purpose computation, as it can find and execute useful work instead of stalling. However, it is typically limited by the requirement of visibly sequential, atomic instruction executionin other words, in-order instruction commit. While in-order commit has a number of advantages, such as providing precise interrupts and avoiding complications with the memory consistency model, it requires the core to hold on to resources (reorder buffer entries, load/store queue entries, physical registers) until they are released in program order. In contrast, out-of-order commit can release some resources much earlier, yielding improved performance and/or lower resource requirements. Non-speculative out-of-order commit is limited in terms of correctness by the conditions described in the work of Bell and Lipasti (2004). In this paper we revisit out-of-order commit by examining the potential performance benefits of lifting these conditions one by one and in combination, for both non-speculative and speculative out-of-order commit. While correctly handling recovery for all out-of-order commit conditions currently requires complex tracking and expensive checkpointing, this work aims to demonstrate the potential for selective, speculative out-of-order commit using an oracle implementation without speculative rollback costs. Through this analysis of the potential of out-of-order commit, we learn that: a) there is significant untapped potential for aggressive variants of out-of-order commit; b) it is important to optimize the out-of-order commit depth for a balanced design, as smaller cores benefit from reduced depth while larger cores continue to benefit from deeper designs; c) the focus on implementing only a subset of the out-of-order commit conditions could lead to efficient implementations; d) the benefits of out-of-order commit increases with higher memory latency and in conjunction with prefetching; e) out-of-order commit exposes additional parallelism in the memory hierarchy.

    National Category
    Computer Sciences
    Identifiers
    urn:nbn:se:uu:diva-365899 (URN)10.1007/s11265-018-1369-4 (DOI)000459428200012 ()
    Available from: 2018-04-26 Created: 2018-11-14 Last updated: 2020-02-02Bibliographically approved
    4. FIFOrder MicroArchitecture: Ready-Aware Instruction Scheduling for OoO Processors
    Open this publication in new window or tab >>FIFOrder MicroArchitecture: Ready-Aware Instruction Scheduling for OoO Processors
    2019 (English)In: 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE), IEEE, 2019, p. 716-721Conference paper, Published paper (Refereed)
    Abstract [en]

    The number of instructions a processor's instruction queue can examine (depth) and the number it can issue together (width) determine its ability to take advantage of the ILP in an application. Unfortunately, increasing either the width or depth of the instruction queue is very costly due to the content-addressable logic needed to wakeup and select instructions out-of-order. This work makes the observation that a large number of instructions have both operands ready at dispatch, and therefore do not benefit from out-of-order scheduling. We leverage this to place such ready-at-dispatch instructions in separate, simpler, in-order FIFO queues for scheduling. With such additional queues, we can reduce the size and width of the expensive out-of-order instruction queue, without reducing the processor's overall issue width and depth. Our design, FIFOrder, is able to steer more than 60% of instructions to the cheaper FIFO queues, providing a 50% energy savings over a traditional out-of-order instruction queue design, while delivering 8% higher performance.

    Place, publisher, year, edition, pages
    IEEE, 2019
    Series
    Design Automation and Test in Europe Conference and Exhibition, ISSN 1530-1591
    National Category
    Computer Systems Computer Sciences
    Identifiers
    urn:nbn:se:uu:diva-389930 (URN)10.23919/DATE.2019.8715034 (DOI)000470666100132 ()978-3-9819263-2-3 (ISBN)
    Conference
    Design, Automation & Test in Europe Conference & Exhibition (DATE), MAR 25-29, 2019, Florence, ITALY
    Funder
    Knut and Alice Wallenberg Foundation
    Available from: 2019-08-01 Created: 2019-08-01 Last updated: 2020-02-02Bibliographically approved
    5. Delay and Bypass: Ready and Criticality Aware Instruction Scheduling in Out-of-Order Processors
    Open this publication in new window or tab >>Delay and Bypass: Ready and Criticality Aware Instruction Scheduling in Out-of-Order Processors
    2020 (English)In: 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA), 2020, p. 424-434Conference paper, Published paper (Refereed)
    Abstract [en]

    Flexible instruction scheduling is essential for performance in out-of-order processors. This is typically achieved by using CAM-based Instruction Queues (IQs) that provide complete flexibility in choosing ready instructions for execution, but at the cost of significant scheduling energy.

    In this work we seek to reduce the instruction scheduling energy by reducing the depth and width of the IQ. We do so by classifying instructions based on their readiness and criticality, and using this information to bypass the IQ for instructions that will not benefit from its expensive scheduling structures and delay instructions that will not harm performance. Combined, these approaches allow us to offload a significant portion of the instructions from the IQ to much cheaper FIFO-based scheduling structures without hurting performance. As a result we can reduce the IQ depth and width by half, thereby saving energy.

    Our design, Delay and Bypass (DNB), is the first design to explicitly address both readiness and criticality to reduce scheduling energy. By handling both classes we are able to achieve 95% of the baseline out-of-order performance while only using 33% of the scheduling energy. This represents a significant improvement over previous designs which addressed only criticality or readiness (91%/89% performance at 74%/53% energy).

    Series
    International Symposium on High-Performance Computer Architecture-Proceedings, ISSN 1530-0897, E-ISSN 2378-203X
    National Category
    Computer Sciences
    Identifiers
    urn:nbn:se:uu:diva-403674 (URN)10.1109/HPCA47549.2020.00042 (DOI)000531494100032 ()978-1-7281-6149-5 (ISBN)
    Conference
    The 26th IEEE International Symposium on High-Performance Computer Architecture (HPCA), Feb. 22-26, 2020, San Diego, CA, USA
    Note

    As originally published there was an error in the document's author byline. The order was intended to be: Mehdi Alipour (Uppsala University); Rakesh Kumar (Norwegian University of Science and Technology (NTNU)); Stefanos Kaxiras and David Black-Schaffer (Uppsala University), as noted here. The article PDF remains unchanged.

    Available from: 2020-02-02 Created: 2020-02-02 Last updated: 2020-06-17Bibliographically approved
    Download full text (pdf)
    fulltext
    Download (jpg)
    presentationsbild
  • 7.
    Alipour, Mehdi
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Carlson, Trevor E.
    Black-Schaffer, David
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Kaxiras, Stefanos
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Maximizing limited resources: A limit-based study and taxonomy of out-of-order commit2019In: Journal of Signal Processing Systems, ISSN 1939-8018, E-ISSN 1939-8115, Vol. 91, no 3-4, p. 379-397Article in journal (Refereed)
    Abstract [en]

    Out-of-order execution is essential for high performance, general-purpose computation, as it can find and execute useful work instead of stalling. However, it is typically limited by the requirement of visibly sequential, atomic instruction executionin other words, in-order instruction commit. While in-order commit has a number of advantages, such as providing precise interrupts and avoiding complications with the memory consistency model, it requires the core to hold on to resources (reorder buffer entries, load/store queue entries, physical registers) until they are released in program order. In contrast, out-of-order commit can release some resources much earlier, yielding improved performance and/or lower resource requirements. Non-speculative out-of-order commit is limited in terms of correctness by the conditions described in the work of Bell and Lipasti (2004). In this paper we revisit out-of-order commit by examining the potential performance benefits of lifting these conditions one by one and in combination, for both non-speculative and speculative out-of-order commit. While correctly handling recovery for all out-of-order commit conditions currently requires complex tracking and expensive checkpointing, this work aims to demonstrate the potential for selective, speculative out-of-order commit using an oracle implementation without speculative rollback costs. Through this analysis of the potential of out-of-order commit, we learn that: a) there is significant untapped potential for aggressive variants of out-of-order commit; b) it is important to optimize the out-of-order commit depth for a balanced design, as smaller cores benefit from reduced depth while larger cores continue to benefit from deeper designs; c) the focus on implementing only a subset of the out-of-order commit conditions could lead to efficient implementations; d) the benefits of out-of-order commit increases with higher memory latency and in conjunction with prefetching; e) out-of-order commit exposes additional parallelism in the memory hierarchy.

    Download full text (pdf)
    fulltext
  • 8.
    Alipour, Mehdi
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Carlson, Trevor E.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Kaxiras, Stefanos
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Exploring the performance limits of out-of-order commit2017In: Proc. 14th Computing Frontiers Conference, New York: ACM Press, 2017, p. 211-220Conference paper (Refereed)
    Abstract [en]

    Out-of-order execution is essential for high performance, general-purpose computation, as it can find and execute useful work instead of stalling. However, it is limited by the requirement of visibly sequential, atomic instruction execution --- in other words in-order instruction commit. While in-order commit has its advantages, such as providing precise interrupts and avoiding complications with the memory consistency model, it requires the core to hold on to resources (reorder buffer entries, load/store queue entries, registers) until they are released in program order. In contrast, out-of-order commit releases resources much earlier, yielding improved performance with fewer traditional hardware resources. However, out-of-order commit is limited in terms of correctness by the conditions described in the work of Bell and Lipasti. In this paper we revisit out-of-order commit from a different perspective, not by proposing another hardware technique, but by examining these conditions one by one and in combination with respect to their potential performance benefit for both non-speculative and speculative out-of-order commit. While correctly handling recovery for all out-of-order commit conditions currently requires complex tracking and expensive checkpointing, this work aims to demonstrate the potential for selective, speculative out-of-order commit using an oracle implementation without speculative rollback costs. We learn that: a) there is significant untapped potential for aggressive variants of out-of-order commit; b) it is important to optimize the commit depth, or the search distance for out-of-order commit, for a balanced design: smaller cores can benefit from shorter depths while larger cores continue to benefit from aggressive parameters; c) the focus on a subset of out-of-order commit conditions could lead to efficient implementations; d) the benefits for out-of-order commit increase with higher memory latency and works well in conjunction with prefetching to continue to improve performance.

    Download (pdf)
    bilaga
  • 9.
    Alipour, Mehdi
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Kumar, Rakesh
    Norwegian University of Science and Technology.
    Kaxiras, Stefanos
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Black-Schaffer, David
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Delay and Bypass: Ready and Criticality Aware Instruction Scheduling in Out-of-Order Processors2020In: 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA), 2020, p. 424-434Conference paper (Refereed)
    Abstract [en]

    Flexible instruction scheduling is essential for performance in out-of-order processors. This is typically achieved by using CAM-based Instruction Queues (IQs) that provide complete flexibility in choosing ready instructions for execution, but at the cost of significant scheduling energy.

    In this work we seek to reduce the instruction scheduling energy by reducing the depth and width of the IQ. We do so by classifying instructions based on their readiness and criticality, and using this information to bypass the IQ for instructions that will not benefit from its expensive scheduling structures and delay instructions that will not harm performance. Combined, these approaches allow us to offload a significant portion of the instructions from the IQ to much cheaper FIFO-based scheduling structures without hurting performance. As a result we can reduce the IQ depth and width by half, thereby saving energy.

    Our design, Delay and Bypass (DNB), is the first design to explicitly address both readiness and criticality to reduce scheduling energy. By handling both classes we are able to achieve 95% of the baseline out-of-order performance while only using 33% of the scheduling energy. This represents a significant improvement over previous designs which addressed only criticality or readiness (91%/89% performance at 74%/53% energy).

  • 10.
    Alipour, Mehdi
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Kumar, Rakesh
    Norwegian Univ Sci & Technol, Dept Comp Sci, Trondheim, Norway.
    Kaxiras, Stefanos
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Black-Schaffer, David
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    FIFOrder MicroArchitecture: Ready-Aware Instruction Scheduling for OoO Processors2019In: 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE), IEEE, 2019, p. 716-721Conference paper (Refereed)
    Abstract [en]

    The number of instructions a processor's instruction queue can examine (depth) and the number it can issue together (width) determine its ability to take advantage of the ILP in an application. Unfortunately, increasing either the width or depth of the instruction queue is very costly due to the content-addressable logic needed to wakeup and select instructions out-of-order. This work makes the observation that a large number of instructions have both operands ready at dispatch, and therefore do not benefit from out-of-order scheduling. We leverage this to place such ready-at-dispatch instructions in separate, simpler, in-order FIFO queues for scheduling. With such additional queues, we can reduce the size and width of the expensive out-of-order instruction queue, without reducing the processor's overall issue width and depth. Our design, FIFOrder, is able to steer more than 60% of instructions to the cheaper FIFO queues, providing a 50% energy savings over a traditional out-of-order instruction queue design, while delivering 8% higher performance.

    Download full text (pdf)
    Fiforder
  • 11.
    Alirezaie, Marjan
    et al.
    Örebro University.
    Renoux, Jennifer
    Örebro University.
    Köckemann, Uwe
    Örebro University.
    Kristoffersson, Annica
    Örebro University.
    Karlsson, Lars
    Örebro University.
    Blomqvist, Eva
    SICS East.
    Tsiftes, Nicolas
    SICS.
    Voigt, Thiemo
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication. SICS.
    Loutfi, Amy
    Örebro University.
    An Ontology-based Context-aware System for Smart Homes: E-care@ home2017In: Sensors, E-ISSN 1424-8220, Vol. 17, no 7Article in journal (Refereed)
  • 12. Alonso, Juan M.
    et al.
    Nordhamn, Amanda
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Olofsson, Simon
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Voigt, Thiemo
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Bounds on the lifetime of wireless sensor networks with lossy links and directional antennas2016In: Wireless Network Performance Enhancement via Directional Antennas: Models, Protocols, and Systems, Boca Raton, FL: CRC Press, 2016, p. 329-361Chapter in book (Refereed)
  • 13.
    Alves, Ricardo
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Computer Systems. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Leveraging Existing Microarchitectural Structures to Improve First-Level Caching Efficiency2019Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    Low-latency data access is essential for performance. To achieve this, processors use fast first-level caches combined with out-of-order execution, to decrease and hide memory access latency respectively. While these approaches are effective for performance, they cost significant energy, leading to the development of many techniques that require designers to trade-off performance and efficiency.

    Way-prediction and filter caches are two of the most common strategies for improving first-level cache energy efficiency while still minimizing latency. They both have compromises as way-prediction trades off some latency for better energy efficiency, while filter caches trade off some energy efficiency for lower latency. However, these strategies are not mutually exclusive. By borrowing elements from both, and taking into account SRAM memory layout limitations, we proposed a novel MRU-L0 cache that mitigates many of their shortcomings while preserving their benefits. Moreover, while first-level caches are tightly integrated into the cpu pipeline, existing work on these techniques largely ignores the impact they have on instruction scheduling. We show that the variable hit latency introduced by way-misspredictions causes instruction replays of load dependent instruction chains, which hurts performance and efficiency. We study this effect and propose a variable latency cache-hit instruction scheduler, that identifies potential misschedulings, reduces instruction replays, reduces negative performance impact, and further improves cache energy efficiency.

    Modern pipelines also employ sophisticated execution strategies to hide memory latency and improve performance. While their primary use is for performance and correctness, they require intermediate storage that can be used as a cache as well. In this work we demonstrate how the store-buffer, paired with the memory dependency predictor, can be used to efficiently cache dirty data; and how the physical register file, paired with a value predictor, can be used to efficiently cache clean data. These strategies not only improve both performance and energy, but do so with no additional storage and minimal additional complexity, since they recycle existing cpu structures to detect reuse, memory ordering violations, and misspeculations.

    List of papers
    1. Addressing energy challenges in filter caches
    Open this publication in new window or tab >>Addressing energy challenges in filter caches
    2017 (English)In: Proc. 29th International Symposium on Computer Architecture and High Performance Computing, IEEE Computer Society, 2017, p. 49-56Conference paper, Published paper (Refereed)
    Abstract [en]

    Filter caches and way-predictors are common approaches to improve the efficiency and/or performance of first-level caches. Filter caches use a small L0 to provide more efficient and faster access to a small subset of the data, and work well for programs with high locality. Way-predictors improve efficiency by accessing only the way predicted, which alleviates the need to read all ways in parallel without increasing latency, but hurts performance due to mispredictions.In this work we examine how SRAM layout constraints (h-trees and data mapping inside the cache) affect way-predictors and filter caches. We show that accessing the smaller L0 array can be significantly more energy efficient than attempting to read fewer ways from a larger L1 cache; and that the main source of energy inefficiency in filter caches comes from L0 and L1 misses. We propose a filter cache optimization that shares the tag array between the L0 and the L1, which incurs the overhead of reading the larger tag array on every access, but in return allows us to directly access the correct L1 way on each L0 miss. This optimization does not add any extra latency and counter-intuitively, improves the filter caches overall energy efficiency beyond that of the way-predictor.By combining the low power benefits of a physically smaller L0 with the reduction in miss energy by reading L1 tags upfront in parallel with L0 data, we show that the optimized filter cache reduces the dynamic cache energy compared to a traditional filter cache by 26% while providing the same performance advantage. Compared to a way-predictor, the optimized cache improves performance by 6% and energy by 2%.

    Place, publisher, year, edition, pages
    IEEE Computer Society, 2017
    National Category
    Computer Sciences
    Identifiers
    urn:nbn:se:uu:diva-334221 (URN)10.1109/SBAC-PAD.2017.14 (DOI)000426895600007 ()978-1-5090-1233-6 (ISBN)
    Conference
    29th International Symposium on Computer Architecture and High Performance Computing SBAC-PAD, 2017, October 17–20, Campinas, Brazil.
    Available from: 2017-11-09 Created: 2017-11-21 Last updated: 2019-05-22Bibliographically approved
    2. Dynamically Disabling Way-prediction to Reduce Instruction Replay
    Open this publication in new window or tab >>Dynamically Disabling Way-prediction to Reduce Instruction Replay
    2018 (English)In: 2018 IEEE 36th International Conference on Computer Design (ICCD), IEEE, 2018, p. 140-143Conference paper, Published paper (Refereed)
    Abstract [en]

    Way-predictors have long been used to reduce dynamic cache energy without the performance loss of serial caches. However, they produce variable-latency hits, as incorrect predictions increase load-to-use latency. While the performance impact of these extra cycles has been well-studied, the need to replay subsequent instructions in the pipeline due to the load latency increase has been ignored. In this work we show that way-predictors pay a significant performance penalty beyond previously studied effects due to instruction replays caused by mispredictions. To address this, we propose a solution that learns the confidence of the way prediction and dynamically disables it when it is likely to mispredict and cause replays. This allows us to reduce cache latency (when we can trust the way-prediction) while still avoiding the need to replay instructions in the pipeline (by avoiding way-mispredictions). Standard way-predictors degrade IPC by 6.9% vs. a parallel cache due to 10% of the instructions being replayed (worst case 42.3%). While our solution decreases way-prediction accuracy by turning off the way-predictor in some cases when it would have been correct, it delivers higher performance than a standard way-predictor. Our confidence-based way-predictor degrades IPC by only 4.4% by replaying just 5.6% of the instructions (worse case 16.3%). This reduces the way-predictor cache energy overhead compared to serial access cache, from 8.5% to 3.7% on average and on the worst case, from 33.8% to 9.5%.

    Place, publisher, year, edition, pages
    IEEE, 2018
    Series
    Proceedings IEEE International Conference on Computer Design, ISSN 1063-6404, E-ISSN 2576-6996
    National Category
    Computer Sciences
    Identifiers
    urn:nbn:se:uu:diva-361215 (URN)10.1109/ICCD.2018.00029 (DOI)000458293200018 ()978-1-5386-8477-1 (ISBN)
    Conference
    IEEE 36th International Conference on Computer Design (ICCD), October 7–10, 2018, Orlando, FL, USA
    Available from: 2018-09-21 Created: 2018-09-21 Last updated: 2019-05-22Bibliographically approved
    3. Minimizing Replay under Way-Prediction
    Open this publication in new window or tab >>Minimizing Replay under Way-Prediction
    2019 (English)Report (Other academic)
    Abstract [en]

    Way-predictors are effective at reducing dynamic cache energy by reducing the number of ways accessed, but introduce additional latency for incorrect way-predictions. While previous work has studied the impact of the increased latency for incorrect way-predictions, we show that the latency variability has a far greater effect as it forces replay of in-flight instructions on an incorrect way-prediction. To address the problem, we propose a solution that learns the confidence of the way-prediction and dynamically disables it when it is likely to mispredict. We further improve this approach by biasing the confidence to reduce latency variability further at the cost of reduced way-predictions. Our results show that instruction replay in a way-predictor reduces IPC by 6.9% due to 10% of the instructions being replayed. Our confidence-based way-predictor degrades IPC by only 2.9% by replaying just 3.4% of the instructions, reducing way-predictor cache energy overhead (compared to serial access cache) from 8.5% to 1.9%.

    Series
    Technical report / Department of Information Technology, Uppsala University, ISSN 1404-3203 ; 2019-003
    National Category
    Computer Sciences
    Identifiers
    urn:nbn:se:uu:diva-383596 (URN)
    Available from: 2019-05-17 Created: 2019-05-17 Last updated: 2019-07-03Bibliographically approved
    4. Filter caching for free: The untapped potential of the store-buffer
    Open this publication in new window or tab >>Filter caching for free: The untapped potential of the store-buffer
    2019 (English)In: Proc. 46th International Symposium on Computer Architecture, New York: ACM Press, 2019, p. 436-448Conference paper, Published paper (Refereed)
    Abstract [en]

    Modern processors contain store-buffers to allow stores to retire under a miss, thus hiding store-miss latency. The store-buffer needs to be large (for performance) and searched on every load (for correctness), thereby making it a costly structure in both area and energy. Yet on every load, the store-buffer is probed in parallel with the L1 and TLB, with no concern for the store-buffer's intrinsic hit rate or whether a store-buffer hit can be predicted to save energy by disabling the L1 and TLB probes.

    In this work we cache data that have been written back to memory in a unified store-queue/buffer/cache, and predict hits to avoid L1/TLB probes and save energy. By dynamically adjusting the allocation of entries between the store-queue/buffer/cache, we can achieve nearly optimal reuse, without causing stalls. We are able to do this efficiently and cheaply by recognizing key properties of stores: free caching (since they must be written into the store-buffer for correctness we need no additional data movement), cheap coherence (since we only need to track state changes of the local, dirty data in the store-buffer), and free and accurate hit prediction (since the memory dependence predictor already does this for scheduling).

    As a result, we are able to increase the store-buffer hit rate and reduce store-buffer/TLB/L1 dynamic energy by 11.8% (up to 26.4%) on SPEC2006 without hurting performance (average IPC improvements of 1.5%, up to 4.7%).The cost for these improvements is a 0.2% increase in L1 cache capacity (1 bit per line) and one additional tail pointer in the store-buffer.

    Place, publisher, year, edition, pages
    New York: ACM Press, 2019
    National Category
    Computer Sciences
    Identifiers
    urn:nbn:se:uu:diva-383473 (URN)10.1145/3307650.3322269 (DOI)000521059600034 ()978-1-4503-6669-4 (ISBN)
    Conference
    ISCA 2019, June 22–26, Phoenix, AZ
    Funder
    Knut and Alice Wallenberg FoundationEU, Horizon 2020, 715283EU, Horizon 2020, 801051Swedish Foundation for Strategic Research , SM17-0064
    Available from: 2019-06-22 Created: 2019-05-16 Last updated: 2020-04-27Bibliographically approved
    5. Efficient temporal and spatial load to load forwarding
    Open this publication in new window or tab >>Efficient temporal and spatial load to load forwarding
    2020 (English)In: Proc. 26th International Symposium on High-Performance and Computer Architecture, IEEE Computer Society, 2020Conference paper, Published paper (Refereed)
    Place, publisher, year, edition, pages
    IEEE Computer Society, 2020
    National Category
    Computer Sciences
    Identifiers
    urn:nbn:se:uu:diva-383477 (URN)
    Conference
    HPCA 2020, February 22–26, San Diego, CA
    Note

    to appear

    Available from: 2021-08-21 Created: 2019-05-16 Last updated: 2019-11-29Bibliographically approved
    Download full text (pdf)
    fulltext
    Download (jpg)
    presentationsbild
  • 14.
    Alves, Ricardo
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Kaxiras, Stefanos
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Black-Schaffer, David
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Dynamically Disabling Way-prediction to Reduce Instruction Replay2018In: 2018 IEEE 36th International Conference on Computer Design (ICCD), IEEE, 2018, p. 140-143Conference paper (Refereed)
    Abstract [en]

    Way-predictors have long been used to reduce dynamic cache energy without the performance loss of serial caches. However, they produce variable-latency hits, as incorrect predictions increase load-to-use latency. While the performance impact of these extra cycles has been well-studied, the need to replay subsequent instructions in the pipeline due to the load latency increase has been ignored. In this work we show that way-predictors pay a significant performance penalty beyond previously studied effects due to instruction replays caused by mispredictions. To address this, we propose a solution that learns the confidence of the way prediction and dynamically disables it when it is likely to mispredict and cause replays. This allows us to reduce cache latency (when we can trust the way-prediction) while still avoiding the need to replay instructions in the pipeline (by avoiding way-mispredictions). Standard way-predictors degrade IPC by 6.9% vs. a parallel cache due to 10% of the instructions being replayed (worst case 42.3%). While our solution decreases way-prediction accuracy by turning off the way-predictor in some cases when it would have been correct, it delivers higher performance than a standard way-predictor. Our confidence-based way-predictor degrades IPC by only 4.4% by replaying just 5.6% of the instructions (worse case 16.3%). This reduces the way-predictor cache energy overhead compared to serial access cache, from 8.5% to 3.7% on average and on the worst case, from 33.8% to 9.5%.

  • 15.
    Alves, Ricardo
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems. 2111 NE 25th Ave, Hillsboro, OR 97124 USA..
    Kaxiras, Stefanos
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Computer Systems.
    Black-Schaffer, David
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Early Address Prediction: Efficient Pipeline Prefetch and Reuse2021In: ACM Transactions on Architecture and Code Optimization (TACO), ISSN 1544-3566, E-ISSN 1544-3973, Vol. 18, no 3, article id 39Article in journal (Refereed)
    Abstract [en]

    Achieving low load-to-use latency with low energy and storage overheads is critical for performance. Existing techniques either prefetch into the pipeline (via address prediction and validation) or provide data reuse in the pipeline (via register sharing or LO caches). These techniques provide a range of tradeoffs between latency, reuse, and overhead. In this work, we present a pipeline prefetching technique that achieves state-of-the-art performance and data reuse without additional data storage, data movement, or validation overheads by adding address tags to the register file. Our addition of register file tags allows us to forward (reuse) load data from the register file with no additional data movement, keep the data alive in the register file beyond the instruction's lifetime to increase temporal reuse, and coalesce prefetch requests to achieve spatial reuse. Further, we show that we can use the existing memory order violation detection hardware to validate prefetches and data forwards without additional overhead. Our design achieves the performance of existing pipeline prefetching while also forwarding 32% of the loads from the register file (compared to 15% in state-of-the-art register sharing), delivering a 16% reduction in L1 dynamic energy (1.6% total processor energy), with an area overhead of less than 0.5%.

    Download full text (pdf)
    FULLTEXT01
  • 16.
    Alves, Ricardo
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Kaxiras, Stefanos
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Black-Schaffer, David
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Efficient temporal and spatial load to load forwarding2020In: Proc. 26th International Symposium on High-Performance and Computer Architecture, IEEE Computer Society, 2020Conference paper (Refereed)
  • 17.
    Alves, Ricardo
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Kaxiras, Stefanos
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Black-Schaffer, David
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Minimizing Replay under Way-Prediction2019Report (Other academic)
    Abstract [en]

    Way-predictors are effective at reducing dynamic cache energy by reducing the number of ways accessed, but introduce additional latency for incorrect way-predictions. While previous work has studied the impact of the increased latency for incorrect way-predictions, we show that the latency variability has a far greater effect as it forces replay of in-flight instructions on an incorrect way-prediction. To address the problem, we propose a solution that learns the confidence of the way-prediction and dynamically disables it when it is likely to mispredict. We further improve this approach by biasing the confidence to reduce latency variability further at the cost of reduced way-predictions. Our results show that instruction replay in a way-predictor reduces IPC by 6.9% due to 10% of the instructions being replayed. Our confidence-based way-predictor degrades IPC by only 2.9% by replaying just 3.4% of the instructions, reducing way-predictor cache energy overhead (compared to serial access cache) from 8.5% to 1.9%.

    Download full text (pdf)
    fulltext
  • 18.
    Alves, Ricardo
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Nikoleris, Nikos
    ARM Res, Lund, Sweden.
    Kaxiras, Stefanos
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Black-Schaffer, David
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Addressing energy challenges in filter caches2017In: Proc. 29th International Symposium on Computer Architecture and High Performance Computing, IEEE Computer Society, 2017, p. 49-56Conference paper (Refereed)
    Abstract [en]

    Filter caches and way-predictors are common approaches to improve the efficiency and/or performance of first-level caches. Filter caches use a small L0 to provide more efficient and faster access to a small subset of the data, and work well for programs with high locality. Way-predictors improve efficiency by accessing only the way predicted, which alleviates the need to read all ways in parallel without increasing latency, but hurts performance due to mispredictions.In this work we examine how SRAM layout constraints (h-trees and data mapping inside the cache) affect way-predictors and filter caches. We show that accessing the smaller L0 array can be significantly more energy efficient than attempting to read fewer ways from a larger L1 cache; and that the main source of energy inefficiency in filter caches comes from L0 and L1 misses. We propose a filter cache optimization that shares the tag array between the L0 and the L1, which incurs the overhead of reading the larger tag array on every access, but in return allows us to directly access the correct L1 way on each L0 miss. This optimization does not add any extra latency and counter-intuitively, improves the filter caches overall energy efficiency beyond that of the way-predictor.By combining the low power benefits of a physically smaller L0 with the reduction in miss energy by reading L1 tags upfront in parallel with L0 data, we show that the optimized filter cache reduces the dynamic cache energy compared to a traditional filter cache by 26% while providing the same performance advantage. Compared to a way-predictor, the optimized cache improves performance by 6% and energy by 2%.

  • 19.
    Alves, Ricardo
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Ros, Alberto
    Univ Murcia, Murcia, Spain.
    Black-Schaffer, David
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Kaxiras, Stefanos
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Filter caching for free: The untapped potential of the store-buffer2019In: Proc. 46th International Symposium on Computer Architecture, New York: ACM Press, 2019, p. 436-448Conference paper (Refereed)
    Abstract [en]

    Modern processors contain store-buffers to allow stores to retire under a miss, thus hiding store-miss latency. The store-buffer needs to be large (for performance) and searched on every load (for correctness), thereby making it a costly structure in both area and energy. Yet on every load, the store-buffer is probed in parallel with the L1 and TLB, with no concern for the store-buffer's intrinsic hit rate or whether a store-buffer hit can be predicted to save energy by disabling the L1 and TLB probes.

    In this work we cache data that have been written back to memory in a unified store-queue/buffer/cache, and predict hits to avoid L1/TLB probes and save energy. By dynamically adjusting the allocation of entries between the store-queue/buffer/cache, we can achieve nearly optimal reuse, without causing stalls. We are able to do this efficiently and cheaply by recognizing key properties of stores: free caching (since they must be written into the store-buffer for correctness we need no additional data movement), cheap coherence (since we only need to track state changes of the local, dirty data in the store-buffer), and free and accurate hit prediction (since the memory dependence predictor already does this for scheduling).

    As a result, we are able to increase the store-buffer hit rate and reduce store-buffer/TLB/L1 dynamic energy by 11.8% (up to 26.4%) on SPEC2006 without hurting performance (average IPC improvements of 1.5%, up to 4.7%).The cost for these improvements is a 0.2% increase in L1 cache capacity (1 bit per line) and one additional tail pointer in the store-buffer.

    Download full text (pdf)
    fulltext
  • 20.
    Aris, Ahmet
    et al.
    Istanbul Technical University.
    Oktuğ, Sema
    Istanbul Technical University.
    Voigt, Thiemo
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Security of Internet of Things for a Reliable Internet of Services2018In: Autonomous Control for a Reliable Internet of Services: Methods, Models, Approaches, Techniques, Algorithms, and Tools / [ed] Ivan Ganchev, R. D. van der Mei, Hans van den Berg, Cham , 2018Chapter in book (Refereed)
    Download full text (pdf)
    fulltext
  • 21.
    Asad, Hafiz Areeb
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology.
    Wouters, Erik Henricus
    Bhatti, Naveed Anwar
    Mottola, Luca
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication. RISE.
    Voigt, Thiemo
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems. RISE.
    On Securing Persistent State in Intermittent Computing2020In: ENSsys '20: Proceedings of the 8th International Workshop on Energy Harvesting and Energy-Neutral Sensing Systems, 2020, p. 8-14Conference paper (Refereed)
    Abstract [en]

    We present the experimental evaluation of different security mechanisms applied to persistent state in intermittent computing. Whenever executions become intermittent because of energy scarcity, systems employ persistent state on non-volatile memories (NVMs) to ensure forward progress of applications. Persistent state spans operating system and network stack, as well as applications. While a device is off recharging energy buffers, persistent state on NVMs may be subject to security threats such as stealing sensitive information or tampering with configuration data, which may ultimately corrupt the device state and render the system unusable. Based on modern platforms of the Cortex M* series, we experimentally investigate the impact on typical intermittent computing workloads of different means to protect persistent state, including software and hardware implementations of staple encryption algorithms and the use of ARM TrustZone protection mechanisms. Our results indicate that i) software implementations bear a significant overhead in energy and time, sometimes harming forward progress, but also retaining the advantage of modularity and easier updates; ii) hardware implementations offer much lower overhead compared to their software counterparts, but require a deeper understanding of their internals to gauge their applicability in given application scenarios; and iii) TrustZone shows almost negligible overhead, yet it requires a different memory management and is only effective as long as attackers cannot directly access the NVMs.

  • 22.
    Asan, Noor Badariah
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Technology, Department of Engineering Sciences, Solid State Electronics. Universiti Teknikal Malaysia Melaka, Melaka Malaysia.
    Carlos, Pérez Penichet
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Redzwan, Syaiful
    Uppsala University, Disciplinary Domain of Science and Technology, Technology, Department of Engineering Sciences, Solid State Electronics.
    Noreland, Daniel
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology.
    Hassan, Emadeldeen
    Umeå University.
    Rydberg, Anders
    Uppsala University, Disciplinary Domain of Science and Technology, Technology, Department of Engineering Sciences, Solid State Electronics.
    Blokhuis, Taco
    Maastricht University Medical Center+, Netherlands.
    Voigt, Thiemo
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Augustine, Robin
    Uppsala University, Disciplinary Domain of Science and Technology, Technology, Department of Engineering Sciences, Solid State Electronics.
    Data Packet Transmission through Fat Tissue for Wireless Intra-Body Networks2017In: IEEE Journal of Electromagnetics, RF and Microwaves in Medicine and Biology, ISSN 2469-7249, Vol. 1, no 2, p. 43-51Article in journal (Refereed)
    Abstract [en]

    This work explores high data rate microwave communication through fat tissue in order to address the wide bandwidth requirements of intra-body area networks. We have designed and carried out experiments on an IEEE 802.15.4 based WBAN prototype by measuring the performance of the fat tissue channel in terms of data packet reception with respect to tissue length and power transmission. This paper proposes and demonstrates a high data rate communication channel through fat tissue using phantom and ex-vivo environments. Here, we achieve a data packet reception of approximately 96 % in both environments. The results also show that the received signal strength drops by ~1 dBm per 10 mm in phantom and ~2 dBm per 10 mm in ex-vivo. The phantom and ex-vivo experimentations validated our approach for high data rate communication through fat tissue for intrabody network applications. The proposed method opens up new opportunities for further research in fat channel communication. This study will contribute to the successful development of high bandwidth wireless intra-body networks that support high data rate implanted, ingested, injected, or worn devices

    Download full text (pdf)
    fulltext
  • 23.
    Asan, Noor Badariah
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Technology, Department of Engineering Sciences, Solid State Electronics. Faculty of Electronic and Computer Engineering, Universiti Teknikal Malaysia Melaka.
    Hassan, Emadeldeen
    Perez, Mauricio D.
    Uppsala University, Disciplinary Domain of Science and Technology, Technology, Department of Engineering Sciences, Solid State Electronics.
    Joseph, Laya
    Uppsala University, Disciplinary Domain of Science and Technology, Technology, Department of Engineering Sciences, Solid State Electronics.
    Berggren, Martin
    Voigt, Thiemo
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Augustine, Robin
    Uppsala University, Disciplinary Domain of Science and Technology, Technology, Department of Engineering Sciences, Solid State Electronics.
    Fat-intrabody communication at 5.8 GHz including impacts of dynamics body movementsManuscript (preprint) (Other academic)
  • 24.
    Asan, Noor Badariah
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Technology, Department of Engineering Sciences, Solid State Electronics. Univ Tekn Malaysia Melaka, Fac Elect & Comp Engn, Durian Tunggal 76100, Malaysia.
    Hassan, Emadeldeen
    Umea Univ, Dept Comp Sci, S-90187 Umea, Sweden;Menoufia Univ, Dept Elect & Elect Commun, Menoufia 32952, Egypt.
    Perez, Mauricio David
    Uppsala University, Disciplinary Domain of Science and Technology, Technology, Department of Engineering Sciences, Solid State Electronics.
    Shah, Syaiful Redzwan Mohd
    Uppsala University, Disciplinary Domain of Science and Technology, Technology, Department of Engineering Sciences, Solid State Electronics.
    Velander, Jacob
    Uppsala University, Disciplinary Domain of Science and Technology, Technology, Department of Engineering Sciences, Solid State Electronics.
    Blokhuis, Taco J.
    Maastricht Univ, Dept Surg, Med Ctr, NL-6229 HX Maastricht, Netherlands.
    Voigt, Thiemo
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication. ¨.
    Augustine, Robin
    Uppsala University, Disciplinary Domain of Science and Technology, Technology, Department of Engineering Sciences, Solid State Electronics.
    Assessment of Blood Vessel Effect on Fat-Intrabody Communication Using Numerical and Ex-Vivo Models at 2.45 GHZ2019In: IEEE Access, E-ISSN 2169-3536, Vol. 7, p. 89886-89900Article in journal (Refereed)
    Abstract [en]

    The potential offered by the intra-body communication (IBC) over the past few years has resulted in a spike of interest for the topic, specifically for medical applications. Fat-IBC is subsequently a novel alternative technique that utilizes fat tissue as a communication channel. This work aimed to identify such transmission medium and its performance in varying blood-vessel systems at 2.45 GHz, particularly in the context of the IBC and medical applications. It incorporated three-dimensional (3D) electromagnetic simulations and laboratory investigations that implemented models of blood vessels of varying orientations, sizes, and positions. Such investigations were undertaken by using ex-vivo porcine tissues and three blood-vessel system configurations. These configurations represent extreme cases of real-life scenarios that sufficiently elucidated their principal influence on the transmission. The blood-vessel models consisted of ex-vivo muscle tissues and copper rods. The results showed that the blood vessels crossing the channel vertically contributed to 5.1 dB and 17.1 dB signal losses for muscle and copper rods, respectively, which is the worst-case scenario in the context of fat-channel with perturbance. In contrast, blood vessels aligned-longitudinally in the channel have less effect and yielded 4.5 dB and 4.2 dB signal losses for muscle and copper rods, respectively. Meanwhile, the blood vessels crossing the channel horizontally displayed 3.4 dB and 1.9 dB signal losses for muscle and copper rods, respectively, which were the smallest losses among the configurations. The laboratory investigations were in agreement with the simulations. Thus, this work substantiated the fat-IBC signal transmission variability in the context of varying blood vessel configurations.

    Download full text (pdf)
    fulltext
  • 25.
    Asan, Noor Badariah
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Technology, Department of Electrical Engineering, Solid-State Electronics. Univ Tekn Malaysia Melaka, Fac Elect & Comp Engn, Melaka, Malaysia.
    Hassan, Emadeldeen
    Umeå Univ, Dept Comp Sci, Umeå, Sweden.;Menoufia Univ, Dept Elect & Elect Commun, Menoufia, Egypt..
    Redzwan, Syaiful
    Uppsala University, Disciplinary Domain of Science and Technology, Technology, Department of Electrical Engineering, Solid-State Electronics.
    Velander, Jacob
    Uppsala University, Disciplinary Domain of Science and Technology, Technology, Department of Electrical Engineering, Solid-State Electronics.
    Voigt, Thiemo
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Augustine, Robin
    Uppsala University, Disciplinary Domain of Science and Technology, Technology, Department of Electrical Engineering, Solid-State Electronics.
    Impact of Blood Vessels on Data Packet Transmission Through the Fat Channel2018In: 2018 IEEE International RF and Microwave Conference (RFM) / [ed] I. Pasya, A. H. Awang & F. C. Seman, IEEE , 2018, p. 196-198Conference paper (Refereed)
    Abstract [en]

    The reliability of intra-body wireless communication systems is very important in medical applications to ensure the data transmission between implanted devices. In this paper, we present newly developed measurements to investigate the effect of blood vessels on the data packet reception through the fat tissue. We use an IEEE 802.15.4-based WBAN prototype to measure the packet reception rate (PRR) through a tissue-equivalent phantom model. The blood vessels are modelled using copper rods. We measure the PRR at the frequency 2.45 GHz for several power levels. The results revealed that the presence of blood vessels aligned with the fat channel has tiny influence on the PRR when measured over the range -25 dBm to 0 dBm power level and for different blood vessels positions. Our investigations show 97% successful PRR through a 10 cm length fat channel in presence of the blood vessels.

  • 26.
    Asan, Noor Badariah
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Technology, Department of Engineering Sciences, Solid State Electronics.
    Hassan, Emadeldeen
    Umea Univ, Dept Comp Sci, S-90187 Umea, Sweden;Menoufia Univ, Dept Elect & Elect Commun, Menoufia 32952, Egypt.
    Velander, Jacob
    Uppsala University, Disciplinary Domain of Science and Technology, Technology, Department of Engineering Sciences, Solid State Electronics.
    Redzwan, Syaiful
    Uppsala University, Disciplinary Domain of Science and Technology, Technology, Department of Engineering Sciences, Solid State Electronics.
    Noreland, Daniel
    Umea Univ, Dept Comp Sci, S-90187 Umea, Sweden.
    Blokhuis, Taco J.
    Maastricht Univ, Med Ctr, Dept Surg, NL-6229 HX Maastricht, Netherlands.
    Wadbro, Eddie
    Umea Univ, Dept Comp Sci, S-90187 Umea, Sweden.
    Berggren, Martin
    Umea Univ, Dept Comp Sci, S-90187 Umea, Sweden.
    Voigt, Thiemo
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Augustine, Robin
    Uppsala University, Disciplinary Domain of Science and Technology, Technology, Department of Engineering Sciences, Solid State Electronics.
    Characterization of the Fat Channel for Intra-Body Communication at R-Band Frequencies2018In: Sensors, E-ISSN 1424-8220, Vol. 18, no 9, article id 2752Article in journal (Refereed)
    Abstract [en]

    In this paper, we investigate the use of fat tissue as a communication channel between in-body, implanted devices at R-band frequencies (1.7-2.6 GHz). The proposed fat channel is based on an anatomical model of the human body. We propose a novel probe that is optimized to efficiently radiate the R-band frequencies into the fat tissue. We use our probe to evaluate the path loss of the fat channel by studying the channel transmission coefficient over the R-band frequencies. We conduct extensive simulation studies and validate our results by experimentation on phantom and ex-vivo porcine tissue, with good agreement between simulations and experiments. We demonstrate a performance comparison between the fat channel and similar waveguide structures. Our characterization of the fat channel reveals propagation path loss of similar to 0.7 dB and similar to 1.9 dB per cm for phantom and ex-vivo porcine tissue, respectively. These results demonstrate that fat tissue can be used as a communication channel for high data rate intra-body networks.

    Download full text (pdf)
    fulltext
  • 27.
    Asan, Noor Badariah
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Technology, Department of Engineering Sciences, Solid State Electronics.
    Noreland, Daniel
    Department of Computing Science, Umeå University, SE-901 87 Umeå, Sweden.
    Hassan, Emadeldeen
    Department of Computing Science, Umeå University, SE-901 87 Umeå, Sweden; Department of Electronics and Electrical Communications, Menoufia University, 32952 Menouf, Egypt.
    Redzwan, Syaiful
    Uppsala University, Disciplinary Domain of Science and Technology, Technology, Department of Engineering Sciences, Solid State Electronics.
    Rydberg, Anders
    Uppsala University, Disciplinary Domain of Science and Technology, Technology, Department of Engineering Sciences, Solid State Electronics.
    Blokhuis, Taco J.
    Department of Surgery, Maastricht University Medical Center+, P. Debyelaan 25, 6229 HX Maastricht, The Netherlands.
    Carlsson, Per-Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Sciences, Transplantation and regenerative medicine.
    Voigt, Thiemo
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Augustine, Robin
    Uppsala University, Disciplinary Domain of Science and Technology, Technology, Department of Engineering Sciences, Solid State Electronics.
    Intra-body microwave communication through adipose tissue2017In: Healthcare Technology Letters, E-ISSN 2053-3713, Vol. 4, no 4, p. 115-121Article in journal (Refereed)
    Download full text (pdf)
    fulltext
  • 28.
    Asan, Noor Badariah
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Technology, Department of Engineering Sciences, Solid State Electronics.
    Redzwan, Syaiful
    Uppsala University, Disciplinary Domain of Science and Technology, Technology, Department of Engineering Sciences, Solid State Electronics.
    Rydberg, Anders
    Uppsala University, Disciplinary Domain of Science and Technology, Technology, Department of Engineering Sciences, Solid State Electronics.
    Augustine, Robin
    Uppsala University, Disciplinary Domain of Science and Technology, Technology, Department of Engineering Sciences, Solid State Electronics.
    Noreland, Daniel
    Hassan, Emadeldeen
    Voigt, Thiemo
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Human fat tissue: A microwave communication channel2017In: Proc. 1st MTT-S International Microwave Bio Conference, IEEE, 2017Conference paper (Refereed)
    Abstract [en]

    In this paper, we present an approach for communication through human body tissue in the R-band frequency range. This study examines the ranges of microwave frequencies suitable for intra-body communication. The human body tissues are characterized with respect to their transmission properties using simulation modeling and phantom measurements. The variations in signal coupling with respect to different tissue thicknesses are studied. The simulation and phantom measurement results show that electromagnetic communication in the fat layer is viable with attenuation of approximately 2 dB per 20 mm. 

  • 29.
    Asan, Noor Badariah
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Technology, Department of Electrical Engineering, Solid-State Electronics. Univ Tekn Malaysia Melaka, Fac Elect & Comp Engn, Melaka, Malaysia..
    Redzwan, Syaiful
    Uppsala University, Disciplinary Domain of Science and Technology, Technology, Department of Electrical Engineering, Solid-State Electronics.
    Velander, Jacob
    Uppsala University, Disciplinary Domain of Science and Technology, Technology, Department of Electrical Engineering, Solid-State Electronics.
    Perez, Mauricio D.
    Uppsala University, Disciplinary Domain of Science and Technology, Technology, Department of Electrical Engineering, Solid-State Electronics.
    Augustine, Robin
    Uppsala University, Disciplinary Domain of Science and Technology, Technology, Department of Electrical Engineering, Solid-State Electronics.
    Voigt, Thiemo
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Hassan, Emadeldeen
    Umeå Univ, Dept Comp Sci, Umeå, Sweden.;Menoufia Univ, Dept Elect & Elect Commun, Menoufia, Egypt..
    Blokhuis, Taco J.
    Maastricht Univ, Dept Surg, Med Ctr, Maastricht, Netherlands..
    Effects of Blood Vessels on Fat Channel Microwave Communication2018In: 2018 IEEE Conference On Antenna Measurements & Applications (CAMA), IEEE, 2018Conference paper (Refereed)
    Abstract [en]

    This study aims to investigate the reliability of intra-body microwave propagation through the fat tissue in presence of blood vessels. Here, we consider three types of blood vessels with different sizes. We investigate the impact of the number of blood vessels and their alignment on the transmission of microwave signals through the fat channel. In our study, we employ two probes that act as a transmitter and a receiver. The probes are designed to operate at the Industrial, Scientific, and Medical radio band (2.45 GHz). For a channel length of 100 mm, our results indicate that the presence of the blood vessels may increase the channel path loss by similar to 1.5 dB and similar to 4.5 dB when the vessels are aligned and orthogonally aligned with the fat channel, respectively.

  • 30.
    Asan, Noor Badariah
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Technology, Department of Engineering Sciences, Solid State Electronics.
    Velander, Jacob
    Uppsala University, Disciplinary Domain of Science and Technology, Technology, Department of Engineering Sciences, Solid State Electronics.
    Redzwan, Syaiful
    Uppsala University, Disciplinary Domain of Science and Technology, Technology, Department of Engineering Sciences, Solid State Electronics.
    Augustine, Robin
    Uppsala University, Disciplinary Domain of Science and Technology, Technology, Department of Engineering Sciences, Solid State Electronics.
    Hassan, Emadeldeen
    Department of Computing Science, Umeå University, Umeå, Sweden.
    Noreland, Daniel
    Department of Computing Science, Umeå University, Umeå, Sweden.
    Voigt, Thiemo
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Blokhuis, Taco J.
    Department of Surgery, Maastricht University Medical Center+, Maastricht, The Netherland.
    Reliability of the fat tissue channel for intra-body microwave communication2017In: 2017 IEEE Conference on Antenna Measurements & Applications (CAMA), IEEE, 2017, p. 310-313Conference paper (Refereed)
    Abstract [en]

    Recently, the human fat tissue has been proposed as a microwave channel for intra-body sensor applications. In this work, we assess how disturbances can prevent reliable microwave propagation through the fat channel. Perturbants of different sizes are considered. The simulation and experimental results show that efficient communication through the fat channel is possible even in the presence of perturbants such as embedded muscle layers and blood vessels. We show that the communication channel is not affected by perturbants that are smaller than 15 mm cube.

    Download full text (pdf)
    fulltext
  • 31.
    Asan, Noor Badariah
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Technology, Department of Engineering Sciences, Solid State Electronics.
    Velander, Jacob
    Uppsala University, Disciplinary Domain of Science and Technology, Technology, Department of Engineering Sciences, Solid State Electronics.
    Redzwan, Syaiful
    Uppsala University, Disciplinary Domain of Science and Technology, Technology, Department of Engineering Sciences, Solid State Electronics.
    Perez, Mauricio D.
    Uppsala University, Disciplinary Domain of Science and Technology, Technology, Department of Engineering Sciences, Solid State Electronics.
    Hassan, Emadeldeen
    Umeå University, Department of Computing Science, Umeå, Sweden.
    Blokhuis, Taco J.
    Maastricht University Medical Center, Department of Surgery, Maastricht, The Netherland.
    Voigt, Thiemo
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Augustine, Robin
    Uppsala University, Disciplinary Domain of Science and Technology, Technology, Department of Engineering Sciences, Solid State Electronics.
    Effect of thickness inhomogeneity in fat tissue on in-body microwave propagation2018In: 2018 IEEE International Microwave Biomedical Conference (IMBioC), Philadelphia, USA: IEEE, 2018, p. 136-138Conference paper (Refereed)
    Abstract [en]

    In recent studies, it has been found that fat tissue can be used as a microwave communication channel. In this article, the effect of thickness inhomogeneities in fat tissues on the performance of in-body microwave communication at 2.45 GHz is investigated using phantom models. We considered two models namely concave and convex geometrical fat distribution to account for the thickness inhomogeneities. The thickness of the fat tissue is varied from 5 mm to 45 mm and the Gap between the transmitter/receiver and the starting and ending of concavity/convexity is varied from 0 mm to 25 mm for a length of 100 mm to study the behavior in the microwave propagation. The phantoms of different geometries, concave and convex, are used in this work to validate the numerical studies. It was noticed that the convex model exhibited higher signal coupling by an amount of 1 dB (simulation) and 2 dB (measurement) compared to the concave model. From the study, it was observed that the signal transmission improves up to 30 mm thick fat and reaches a plateau when the thickness is increased further.

    Download full text (pdf)
    fulltext
  • 32.
    Asgharzadeh, Ashkan
    et al.
    Univ Murcia, Comp Engn Dept, Murcia 30100, Spain..
    Cebrian, Juan M.
    Univ Murcia, Comp Engn Dept, Murcia 30100, Spain..
    Perais, Arthur
    Univ Grenoble Alpes, CNRS, Grenoble INP, Inst Engn,TIMA, Grenoble, France..
    Kaxiras, Stefanos
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Computer Systems.
    Ros, Alberto
    Univ Murcia, Comp Engn Dept, Murcia 30100, Spain..
    Free Atomics: Hardware Atomic Operations without Fences2022In: PROCEEDINGS OF THE 2022 THE 49TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA '22), ASSOC COMPUTING MACHINERY Association for Computing Machinery (ACM), 2022, p. 14-26Conference paper (Refereed)
    Abstract [en]

    Atomic Read-Modify-Write (RMW) instructions are primitive synchronization operations implemented in hardware that provide the building blocks for higher-abstraction synchronization mechanisms to programmers. According to publicly available documentation, current x86 implementations serialize atomic RMW operations, i.e., the store buffer is drained before issuing atomic RMWs and subsequent memory operations are stalled until the atomic RMW commits. This serialization, carried out by memory fences, incurs a performance cost which is expected to increase with deeper pipelines. This work proposes Free atomics, a lightweight, speculative, deadlock-free implementation of atomic operations that removes the need for memory fences, thus improving performance, while preserving atomicity and consistency. Free atomics is, to the best of our knowledge, the first proposal to enable store-to-load forwarding for atomic RMWs. Free atomics only requires simple modifications and incurs a small area overhead (15 bytes). Our evaluation using gem5-20 shows that, for a 32-core configuration, Free atomics improves performance by 12.5%, on average, for a large range of parallel workloads and 25.2%, on average, for atomic-intensive parallel workloads over a fenced atomic RMW implementation.

  • 33.
    Bagci, Ibrahim Ethem
    et al.
    Univ Lancaster, Sch Comp & Commun, Lancaster, England.
    Raza, Shahid
    SICS Swedish ICT, Kista, Sweden.
    Roedig, Utz
    Univ Lancaster, Sch Comp & Commun, Lancaster, England.
    Voigt, Thiemo
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication. SICS Swedish ICT, Kista, Sweden.
    Fusion: Coalesced Confidential Storage and Communication Framework for the IoT2016In: Security and Communication Networks, ISSN 1939-0114, E-ISSN 1939-0122, Vol. 9, no 15, p. 2656-2673Article in journal (Refereed)
    Abstract [en]

    Comprehensive security mechanisms are required for a successful implementation of the Internet of Things (IoT). Existing solutions focus mainly on securing the communication links between Internet hosts and IoT devices. However, as most IoT devices nowadays provide vast amounts of flash storage space, it is as well required to consider storage security within a comprehensive security framework. Instead of developing independent security solutions for storage and communication, we propose Fusion, a framework that provides coalesced confidential storage and communication. Fusion uses existing secure communication protocols for the IoT such as Internet protocol security (IPsec) and datagram transport layer security (DTLS) and re-uses the defined communication security mechanisms within the storage component. Thus, trusted mechanisms developed for communication security are extended into the storage space. Notably, this mechanism allows us to transmit requested data directly from the file system without decrypting read data blocks and then re-encrypting these for transmission. Thus, Fusion provides benefits in terms of processing speed and energy efficiency, which are important aspects for resource-constrained IoT devices. This paper describes the Fusion architecture and its instantiation for IPsec-based and DTLS-based systems. We describe Fusion's implementation and evaluate its storage overheads, communication performance, and energy consumption.

  • 34. Baird, Ryan
    et al.
    Gavin, Peter
    Själander, Magnus
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Whalley, David
    Uh, Gang-Ryung
    Optimizing transfers of control in the static pipeline architecture2015In: Proc. 16th ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems, New York: ACM Press, 2015, p. 7-16Conference paper (Refereed)
    Abstract [en]

    Statically pipelined processors offer a new way to improve the performance beyond that of a traditional in-order pipeline while simultaneously reducing energy usage by enabling the compiler to control more fine-grained details of the program execution. This paper describes how a compiler can exploit the features of the static pipeline architecture to apply optimizations on transfers of control that are not possible on a conventional architecture. The optimizations presented in this paper include hoisting the target address calculations for branches, jumps, and calls out of loops, performing branch chaining between calls and jumps, hoisting the setting of return addresses out of loops, and exploiting conditional calls and returns. The benefits of performing these transfer of control optimizations include a 6.8% reduction in execution time and a 3.6% decrease in estimated energy usage.

  • 35.
    Bambusi, Fulvio
    et al.
    Politecn Milan, Milan, Italy..
    Cerizzi, Francesco
    Politecn Milan, Milan, Italy..
    Lee, Yamin
    Politecn Milan, Milan, Italy..
    Mottola, Luca
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Computer Systems. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication. Politecn Milan, Milan, Italy.;RISE, Gothenburg, Sweden..
    The Case for Approximate Intermittent Computing2022In: 2022 21st ACM/IEEE International Conference on Information Processing in Sensor Networks (ISPN 2022), Institute of Electrical and Electronics Engineers (IEEE), 2022, p. 463-476Conference paper (Refereed)
    Abstract [en]

    We present the concept of approximate intermittent computing and concretely demonstrate its application. Intermittent computations stem from the erratic energy patterns caused by energy harvesting: computations unpredictably terminate whenever energy is insufficient and the application state is lost. Existing solutions maintain equivalence to continuous executions by creating persistent state on non-volatile memory, enabling stateful computations to cross power failures. The performance penalty is massive: system throughput reduces while energy consumption increases. In contrast, approximate intermittent computations trade the accuracy of the results for sparing the entire overhead to maintain equivalence to a continuous execution. This is possible as we use approximation to limit the extent of stateful computations to the single power cycle, enabling the system to completely shift the energy budget for managing persistent state to useful computations towards an immediate approximate result. To this end, we effectively reverse the regular formulation of approximate computing problems. First, we apply approximate intermittent computing to human activity recognition. We design an anytime variation of support vector machines able to improve the accuracy of the classification as energy is available. We build a hw/sw prototype using kinetic energy and show a 7x improvement in system throughput compared to state-of-the-art system support for intermittent computing, while retaining 83% accuracy in a setting where the best attainable accuracy is 88%. Next, we apply approximate intermittent computing in a sharply different scenario, that is, embedded image processing, using loop perforation. Using a different hw/sw prototype we build and diverse energy traces, we show a 5x improvement in system throughput compared to state-of-the-art system support for intermittent computing, while providing an equivalent output in 84% of the cases.

  • 36. Bardizbanyan, Alen
    et al.
    Själander, Magnus
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Whalley, David
    Larsson-Edefors, Per
    Improving data access efficiency by using context-aware loads and stores2015In: Proc. 16th ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems, New York: ACM Press, 2015, p. 27-36Conference paper (Refereed)
    Abstract [en]

    Memory operations have a significant impact on both performance and energy usage even when an access hits in the level-one data cache (L1 DC). Load instructions in particular affect performance as they frequently result in stalls since the register to be loaded is often referenced before the data is available in the pipeline. L1 DC accesses also impact energy usage as they typically require significantly more energy than a register file access. Despite their impact on performance and energy usage, L1 DC accesses on most processors are performed in a general fashion without regard to the context in which the load or store operation is performed. We describe a set of techniques where the compiler enhances load and store instructions so that they can be executed with fewer stalls and/or enable the L1 DC to be accessed in a more energy-efficient manner. We show that using these techniques can simultaneously achieve a 6% gain in performance and a 43% reduction in L1 DC energy usage.

  • 37.
    Bor, Martin
    et al.
    Lancaster University.
    Roedig, Utz
    Lancaster University.
    Voigt, Thiemo
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Alonso, Juan
    Univ. Nac. de Cuyo, Argentina.
    Do LoRa Low-Power Wide-Area Networks Scale?2016Conference paper (Refereed)
    Abstract [en]

    New Internet of Things (IoT) technologies such as LongRange (LoRa) are emerging which enable power ecientwireless communication over very long distances. Devicestypically communicate directly to a sink node which removesthe need of constructing and maintaining a complex multi-hop network. Given the fact that a wide area is coveredand that all devices communicate directly to a few sinknodes a large number of nodes have to share the commu-nication medium. LoRa provides for this reason a rangeof communication options (centre frequency, spreading fac-tor, bandwidth, coding rates) from which a transmitter canchoose. Many combination settings are orthogonal and pro-vide simultaneous collision free communications. Neverthe-less, there is a limit regarding the number of transmitters aLoRa system can support. In this paper we investigate thecapacity limits of LoRa networks. Using experiments wedevelop models describing LoRa communication behaviour.We use these models to parameterise a LoRa simulation tostudy scalability. Our experiments show that a typical smartcity deployment can support 120 nodes per 3.8 ha, which isnot sucient for future IoT deployments. LoRa networkscan scale quite well, however, if they use dynamic commu-nication parameter selection and/or multiple sinks.

    Download full text (pdf)
    fulltext
  • 38.
    Borgström, Gustaf
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Computer Systems.
    Rohner, Christian
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Computer Systems.
    Black-Schaffer, David
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Computer Systems.
    Faster Functional Warming with Cache Merging2022Report (Other academic)
    Abstract [en]

    Smarts-like sampled hardware simulation techniques achieve good accuracy by simulating many small portions of an application in detail. However, while this reduces the detailed simulation time, it results in extensive cache warming times, as each of the many simulation points requires warming the whole memory hierarchy. Adaptive Cache Warming reduces this time by iteratively increasing warming until achieving sufficient accuracy. Unfortunately, each time the warming increases, the previous warming must be redone, nearly doubling the required warming. We address re-warming by developing a technique to merge the cache states from the previous and additional warming iterations.

    We address re-warming by developing a technique to merge the cache states from the previous and additional warming iterations. We demonstrate our merging approach on multi-level LRU cache hierarchy and evaluate and address the introduced errors. By removing warming redundancy, we expect an ideal 2× warming speedup when using our Cache Merging solution together with Adaptive Cache Warming. Experiments show that Cache Merging delivers an average speedup of 1.44×, 1.84×, and 1.87× for 128kB, 2MB, and 8MB L2 caches, respectively, with 95-percentile absolute IPC errors of only 0.029, 0.015, and 0.006, respectively. These results demonstrate that Cache Merging yields significantly higher simulation speed with minimal losses.

    Download full text (pdf)
    Faster_Functional_Warming_with_Cache_Merging.2022.Technical-Report
  • 39.
    Borgström, Gustaf
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Rohner, Christian
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Computer Systems.
    Black-Schaffer, David
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Computer Systems. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Faster FunctionalWarming with Cache Merging2023In: PROCEEDINGS OF SYSTEM ENGINEERING FOR CONSTRAINED EMBEDDED SYSTEMS, DRONESE AND RAPIDO 2023, Association for Computing Machinery (ACM), 2023, p. 39-47Conference paper (Refereed)
    Abstract [en]

    Smarts-like sampled hardware simulation techniques achieve good accuracy by simulating many small portions of an application in detail. However, while this reduces the simulation time, it results in extensive cache warming times, as each of the many simulation points requires warming the whole memory hierarchy. Adaptive Cache Warming reduces this time by iteratively increasing warming to achieve sufficient accuracy. Unfortunately, each increases requires that the previous warming be redone, nearly doubling the total warming. We address re-warming by developing a technique to merge the cache states from the previous and additional warming iterations. We demonstrate our merging approach on multi-level LRU cache hierarchy and evaluate and address the introduced errors. Our experiments show that Cache Merging delivers an average speedup of 1.44x, 1.84x, and 1.87x for 128kB, 2MB, and 8MB L2 caches, respectively, (vs. a 2x theoretical maximum speedup) with 95-percentile absolute IPC errors of only 0.029, 0.015, and 0.006, respectively. These results demonstrate that Cache Merging yields significantly higher simulation speed with minimal losses.

  • 40.
    Borgström, Gustaf
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Sembrant, Andreas
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Black-Schaffer, David
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Adaptive cache warming for faster simulations2017In: Proc. 9th Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools, New York: ACM Press, 2017, article id 1Conference paper (Refereed)
    Download full text (pdf)
    acw-rapido17-revision1
  • 41.
    Cambazoglu, Volkan
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Computer Systems. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Protocol, mobility and adversary models for the verification of security2016Licentiate thesis, comprehensive summary (Other academic)
    Abstract [en]

    The increasing heterogeneity of communicating devices, ranging from resource constrained battery driven sensor nodes to multi-core processor computers, challenges protocol design. We examine security and privacy protocols with respect to exterior factors such as users, adversaries, and computing and communication resources; and also interior factors such as the operations, the interactions and the parameters of a protocol.

    Users and adversaries interact with security and privacy protocols, and even affect the outcome of the protocols. We propose user mobility and adversary models to examine how the location privacy of users is affected when they move relative to each other in specific patterns while adversaries with varying strengths try to identify the users based on their historical locations. The location privacy of the users are simulated with the support of the K-Anonymity protection mechanism, the Distortion-based metric, and our models of users' mobility patterns and adversaries' knowledge about users.

    Security and privacy protocols need to operate on various computing and communication resources. Some of these protocols can be adjusted for different situations by changing parameters. A common example is to use longer secret keys in encryption for stronger security. We experiment with the trade-off between the security and the performance of the Fiat–Shamir identification protocol. We pipeline the protocol to increase its utilisation as the communication delay outweighs the computation.

    A mathematical specification based on a formal method leads to a strong proof of security. We use three formal languages with their tool supports in order to model and verify the Secure Hierarchical In-Network Aggregation (SHIA) protocol for Wireless Sensor Networks (WSNs). The three formal languages specialise on cryptographic operations, distributed systems and mobile processes. Finding an appropriate level of abstraction to represent the essential features of the protocol in three formal languages was central.

    List of papers
    1. The impact of trace and adversary models on location privacy provided by K-anonymity
    Open this publication in new window or tab >>The impact of trace and adversary models on location privacy provided by K-anonymity
    2012 (English)In: Proc. 1st Workshop on Measurement, Privacy, and Mobility, New York: ACM Press, 2012, article id 6Conference paper, Published paper (Refereed)
    Place, publisher, year, edition, pages
    New York: ACM Press, 2012
    National Category
    Computer Sciences
    Research subject
    Computer Science with specialization in Computer Communication
    Identifiers
    urn:nbn:se:uu:diva-171581 (URN)10.1145/2181196.2181202 (DOI)978-1-4503-1163-2 (ISBN)
    Conference
    MPM 2012
    Projects
    ProFuNWISENET
    Available from: 2012-04-10 Created: 2012-03-22 Last updated: 2018-01-12Bibliographically approved
    2. Towards adaptive zero-knowledge protocols: A case study with Fiat–Shamir identification protocol
    Open this publication in new window or tab >>Towards adaptive zero-knowledge protocols: A case study with Fiat–Shamir identification protocol
    2013 (English)In: Proc. 9th Swedish National Computer Networking Workshop, 2013, p. 67-70Conference paper, Published paper (Refereed)
    Abstract [en]

    Interactive zero-knowledge protocols are used as identification protocols. The protocols are executed in rounds, with security being increased with every round. This allows for a trade-off between security and performance to adapt the protocol to the requirements of the scenario. We experimentally investigate the Fiat–Shamir identification protocol on machines and networks with different performance characteristics. We find that the delay of the protocol highly depends on network latency and upload bandwidth. Computation time becomes more visible, when the protocol transmits little amount of data via a low latency network. We also experience that the impact of the sizes of the variables on the delay of the protocol is less than the number of rounds', which are interior factors in the protocol.

    National Category
    Computer Sciences
    Research subject
    Computer Science with specialization in Computer Communication
    Identifiers
    urn:nbn:se:uu:diva-201070 (URN)
    Conference
    SNCNW 2013
    Projects
    WISENETProFuN
    Available from: 2013-06-05 Created: 2013-06-05 Last updated: 2018-01-11Bibliographically approved
    3. Modelling and analysing a WSN secure aggregation protocol: A comparison of languages and tool support
    Open this publication in new window or tab >>Modelling and analysing a WSN secure aggregation protocol: A comparison of languages and tool support
    2015 (English)Report (Other academic)
    Series
    Technical report / Department of Information Technology, Uppsala University, ISSN 1404-3203 ; 2015-033
    National Category
    Computer Sciences Communication Systems
    Research subject
    Computer Science with specialization in Computer Communication
    Identifiers
    urn:nbn:se:uu:diva-268453 (URN)
    Projects
    ProFuN
    Funder
    Swedish Foundation for Strategic Research , RIT08-0065
    Available from: 2015-12-03 Created: 2015-12-04 Last updated: 2018-01-10Bibliographically approved
    Download full text (pdf)
    fulltext
  • 42.
    Cambazoglu, Volkan
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Gutkovas, Ramunas
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computing Science.
    Åman Pohjola, Johannes
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computing Science.
    Victor, Björn
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computing Science.
    Modelling and analysing a WSN secure aggregation protocol: A comparison of languages and tool support2015Report (Other academic)
  • 43.
    Carlos, Perez Penichet
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Hermans, Frederik
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Varshney, Ambuj
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Voigt, Thiemo
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Augmenting IoT networks with backscatter-enabled passive sensor tags2016In: Proceedings of the 3rd Workshop on Hot Topics in Wireless, 2016, p. 23-27Conference paper (Refereed)
    Abstract [en]

    The sensing modalities available in an Internet-of-Things (IoT) network are usually fixed before deployment, when the operator selects a suitable IoT platform. Retrofitting a deployment with additional sensors can be cumbersome, because it requires either modifying the deployed hardware or adding new devices that then have to be maintained. In this paper, we present our vision and work towards passive sensor tags: battery-free devices that allow to augment existing IoT deployments with additional sensing capabilities without the need to modify the existing deployment. Our passive sensor tags use backscatter transmissions to communicate with the deployed network. Crucially, they do this in a way that is compatible with the deployed network's radio protocol, and without the need for additional infrastructure. We present an FPGA-based prototype of a passive sensor tag that can communicate with unmodified 802.15.4 IoT devices. Our initial experiments with the prototype support the feasibility of our approach. We also lay out the next steps towards fully realizing the vision of passive sensor tags.

    Download full text (pdf)
    fulltext
  • 44.
    Carlos, Pérez-Penichet
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Noda, Claro
    Mid-Sweden University, Sweden.
    Varshney, Ambuj
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Voigt, Thiemo
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Battery-free 802.15. 4 Receiver2018In: 17th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN), IEEE, 2018Conference paper (Refereed)
    Abstract [en]

    We present the architecture of an 802.15.4 receiver that, for the first time, operates at a few hundred microwatts, enabling new battery-free applications. To reach the required micro-power consumption, the architecture diverges from that of commodity receivers in two important ways. First, it offloads the power-hungry local oscillator to an external device, much like backscatter transmitters do. Second, we avoid the energy cost of demodulating a phase-modulated signal by treating 802.15.4 as a frequency-modulated one, which allows us to receive with a simple passive detector and an energy-efficient thresholding circuit. We describe a prototype that can receive 802.15.4 frames with a power consumption of 361 μW. Our receiver prototype achieves sufficient communication range to integrate with deployed wireless sensor networks (WSNs). We illustrate this integration by pairing the prototype with an 802.15.4 backscatter transmitter and integrating it with unmodified 802.15.4 sensor nodes running the TSCH and Glossy protocols.

  • 45.
    Carlson, Trevor E.
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Heirman, Wim
    Intel, ExaSci Lab, Santa Clara, CA USA..
    Allam, Osman
    Univ Ghent, B-9000 Ghent, Belgium..
    Kaxiras, Stefanos
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Eeckhout, Lieven
    Univ Ghent, B-9000 Ghent, Belgium..
    The Load Slice Core Microarchitecture2015In: 2015 ACM/IEEE 42Nd Annual International Symposium On Computer Architecture (ISCA), 2015, p. 272-284Conference paper (Refereed)
    Abstract [en]

    Driven by the motivation to expose instruction-level parallelism (ILP), microprocessor cores have evolved from simple, in-order pipelines into complex, superscalar out-of-order designs. By extracting ILP, these processors also enable parallel cache and memory operations as a useful side-effect. Today, however, the growing off-chip memory wall and complex cache hierarchies of many-core processors make cache and memory accesses ever more costly. This increases the importance of extracting memory hierarchy parallelism (MHP), while reducing the net impact of more general, yet complex and power-hungry ILP-extraction techniques. In addition, for multi-core processors operating in power- and energy-constrained environments, energy-efficiency has largely replaced single-thread performance as the primary concern. Based on this observation, we propose a core microarchitecture that is aimed squarely at generating parallel accesses to the memory hierarchy while maximizing energy efficiency. The Load Slice Core extends the efficient in-order, stall-on-use core with a second in-order pipeline that enables memory accesses and address-generating instructions to bypass stalled instructions in the main pipeline. Backward program slices containing address-generating instructions leading up to loads and stores are extracted automatically by the hardware, using a novel iterative algorithm that requires no software support or recompilation. On average, the Load Slice Core improves performance over a baseline in-order processor by 53% with overheads of only 15% in area and 22% in power, leading to an increase in energy efficiency (MIPS/Watt) over in-order and out-of-order designs by 43% and over 4.7x, respectively. In addition, for a power- and area-constrained many-core design, the Load Slice Core outperforms both in-order and out-of-order designs, achieving a 53% and 95% higher performance, respectively, thus providing an alternative direction for future many-core processors.

  • 46.
    Carlson, Trevor E.
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Tran, Kim-Anh
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Jimborean, Alexandra
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Koukos, Konstantinos
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Själander, Magnus
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Kaxiras, Stefanos
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Transcending hardware limits with software out-of-order processing2017In: IEEE Computer Architecture Letters, ISSN 1556-6056, Vol. 16, no 2, p. 162-165Article in journal (Refereed)
  • 47. Catalán Rivas, Victoria
    et al.
    Fröjd, Emil
    Holmberg, Tobias
    Ragnarsson, Felix
    Rick, Elsa
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Corneo, Lorenzo
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Varshney, Ambuj
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Rohner, Christian
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Voigt, Thiemo
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Gunningberg, Per
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Environmental Control at the Edge2018Conference paper (Other academic)
    Download full text (pdf)
    fulltext
  • 48.
    Ceballos, Germán
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    How to make tasks faster: Revealing the complex interactions of tasks in the memory system2017In: Proc. Companion 8th ACM International Conference on Systems, Programming, Languages, and Applications: Software for Humanity, New York: ACM Press, 2017, p. 1-3Conference paper (Refereed)
  • 49.
    Ceballos, Germán
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Computer Systems. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Modeling the interactions between tasks and the memory system2017Licentiate thesis, comprehensive summary (Other academic)
    Abstract [en]

    Making computer systems more energy efficient while obtaining the maximum performance possible is key for future developments in engineering, medicine, entertainment, etc. However it has become a difficult task due to the increasing complexity of hardware and software, and their interactions. For example, developers have to deal with deep, multi-level cache hierarchies on modern CPUs, and keep busy thousands of cores in GPUs, which makes the programming process more difficult.

    To simplify this task, new abstractions and programming models are becoming popular. Their goal is to make applications more scalable and efficient, while still providing the flexibility and portability of old, widely adopted models. One example of this is task-based programming, where simple independent tasks (functions) are delegated to a runtime system which orchestrates their execution. This approach has been successful because the runtime can automatically distribute work across hardware cores and has the potential to minimize data movement and placement (e.g., being aware of the cache hierarchy).

    To build better runtime systems, it is crucial to understand bottlenecks in the performance of current and future multicore systems. In this thesis, we provide fast, accurate and mathematically-sound models and techniques to understand the execution of task-based applications concerning three key aspects: memory behavior (data locality), scheduling, and performance. With these methods, we lay the groundwork for improving runtime system, providing insight into the interplay between the schedule's behavior, data reuse through the cache hierarchy, and the resulting performance.

    List of papers
    1. Shared Resource Sensitivity in Task-Based Runtime Systems
    Open this publication in new window or tab >>Shared Resource Sensitivity in Task-Based Runtime Systems
    2013 (English)In: Proc. 6th Swedish Workshop on Multi-Core Computing, Halmstad University Press, 2013Conference paper, Published paper (Refereed)
    Place, publisher, year, edition, pages
    Halmstad University Press, 2013
    National Category
    Computer Systems
    Identifiers
    urn:nbn:se:uu:diva-212780 (URN)
    Conference
    MCC13, November 25–26, Halmstad, Sweden
    Projects
    Resource Sharing ModelingUPMARC
    Funder
    Swedish Research Council
    Available from: 2013-12-13 Created: 2013-12-13 Last updated: 2018-11-16Bibliographically approved
    2. Formalizing data locality in task parallel applications
    Open this publication in new window or tab >>Formalizing data locality in task parallel applications
    2016 (English)In: Algorithms and Architectures for Parallel Processing, Springer, 2016, p. 43-61Conference paper, Published paper (Refereed)
    Place, publisher, year, edition, pages
    Springer, 2016
    Series
    Lecture Notes in Computer Science, ISSN 0302-9743 ; 10049
    National Category
    Computer Sciences
    Identifiers
    urn:nbn:se:uu:diva-310341 (URN)10.1007/978-3-319-49956-7_4 (DOI)000389797000004 ()978-3-319-49955-0 (ISBN)
    Conference
    ICA3PP 2016, December 14–16, Granada, Spain
    Projects
    UPMARCResource Sharing Modeling
    Funder
    Swedish Foundation for Strategic Research , FFL12-0051
    Available from: 2016-11-19 Created: 2016-12-14 Last updated: 2018-11-16Bibliographically approved
    3. TaskInsight: Understanding task schedules effects on memory and performance
    Open this publication in new window or tab >>TaskInsight: Understanding task schedules effects on memory and performance
    2017 (English)In: Proc. 8th International Workshop on Programming Models and Applications for Multicores and Manycores, New York: ACM Press, 2017, p. 11-20Conference paper, Published paper (Refereed)
    Place, publisher, year, edition, pages
    New York: ACM Press, 2017
    National Category
    Computer Engineering
    Identifiers
    urn:nbn:se:uu:diva-315033 (URN)10.1145/3026937.3026943 (DOI)978-1-4503-4883-6 (ISBN)
    Conference
    PMAM 2017, February 4–8, Austin, TX
    Projects
    UPMARCResource Sharing Modeling
    Funder
    Swedish Research CouncilSwedish Foundation for Strategic Research , FFL12-0051EU, Horizon 2020, 687698
    Available from: 2017-02-04 Created: 2017-02-08 Last updated: 2018-11-16Bibliographically approved
    4. Analyzing performance variation of task schedulers with TaskInsight
    Open this publication in new window or tab >>Analyzing performance variation of task schedulers with TaskInsight
    2018 (English)In: Parallel Computing, ISSN 0167-8191, E-ISSN 1872-7336, Vol. 75, p. 11-27Article in journal (Refereed) Published
    National Category
    Computer Engineering
    Identifiers
    urn:nbn:se:uu:diva-340202 (URN)10.1016/j.parco.2018.02.003 (DOI)000433655700002 ()
    Projects
    UPMARCResource Sharing Modeling
    Funder
    Swedish Research Council, FFL12-0051Swedish Foundation for Strategic Research , FFL12-0051
    Available from: 2018-02-22 Created: 2018-01-26 Last updated: 2018-11-16Bibliographically approved
    Download full text (pdf)
    fulltext
  • 50.
    Ceballos, Germán
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Computer Systems. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Understanding Task Parallelism: Providing insight into scheduling, memory, and performance for CPUs and Graphics2018Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    Maximizing the performance of computer systems while making them more energy efficient is vital for future developments in engineering, medicine, entertainment, etc. However, the increasing complexity of software, hardware, and their interactions makes this task difficult. Software developers have to deal with complex memory architectures such as multilevel caches on modern CPUs and keeping thousands of cores busy in GPUs, which makes the programming process harder.

    Task-based programming provides high-level abstractions to simplify the development process. In this model, independent tasks (functions) are submitted to a runtime system, which orchestrates their execution across hardware resources. This approach has become popular and successful because the runtime can distribute the workload across hardware resources automatically, and has the potential to optimize the execution to minimize data movement (e.g., being aware of the cache hierarchy).

    However, to build better runtime systems, we now need to understand bottlenecks in the performance of current and future multicore architectures. Unfortunately, since most current work was designed for sequential or thread-based workloads, there is an overall lack of tools and methods to gain insight about the execution of these applications, allowing both the runtime and the programmers to detect potential optimizations.

    In this thesis, we address this lack of tools by providing fast, accurate and mathematically-sound models to understand the execution of task-based applications. In particular, we center these models around three key aspects of the execution: memory behavior (data locality), scheduling, and performance. Our contributions provide insight into the interplay between the schedule's behavior, data reuse through the cache hierarchy, and the resulting performance. These contributions lay the groundwork for improving runtime systems. We first apply these methods to analyze a diverse set of CPU applications, and then leverage them to one of the most common workloads in current systems: graphics rendering on GPUs.

    List of papers
    1. Shared Resource Sensitivity in Task-Based Runtime Systems
    Open this publication in new window or tab >>Shared Resource Sensitivity in Task-Based Runtime Systems
    2013 (English)In: Proc. 6th Swedish Workshop on Multi-Core Computing, Halmstad University Press, 2013Conference paper, Published paper (Refereed)
    Place, publisher, year, edition, pages
    Halmstad University Press, 2013
    National Category
    Computer Systems
    Identifiers
    urn:nbn:se:uu:diva-212780 (URN)
    Conference
    MCC13, November 25–26, Halmstad, Sweden
    Projects
    Resource Sharing ModelingUPMARC
    Funder
    Swedish Research Council
    Available from: 2013-12-13 Created: 2013-12-13 Last updated: 2018-11-16Bibliographically approved
    2. Formalizing data locality in task parallel applications
    Open this publication in new window or tab >>Formalizing data locality in task parallel applications
    2016 (English)In: Algorithms and Architectures for Parallel Processing, Springer, 2016, p. 43-61Conference paper, Published paper (Refereed)
    Place, publisher, year, edition, pages
    Springer, 2016
    Series
    Lecture Notes in Computer Science, ISSN 0302-9743 ; 10049
    National Category
    Computer Sciences
    Identifiers
    urn:nbn:se:uu:diva-310341 (URN)10.1007/978-3-319-49956-7_4 (DOI)000389797000004 ()978-3-319-49955-0 (ISBN)
    Conference
    ICA3PP 2016, December 14–16, Granada, Spain
    Projects
    UPMARCResource Sharing Modeling
    Funder
    Swedish Foundation for Strategic Research , FFL12-0051
    Available from: 2016-11-19 Created: 2016-12-14 Last updated: 2018-11-16Bibliographically approved
    3. Analyzing performance variation of task schedulers with TaskInsight
    Open this publication in new window or tab >>Analyzing performance variation of task schedulers with TaskInsight
    2018 (English)In: Parallel Computing, ISSN 0167-8191, E-ISSN 1872-7336, Vol. 75, p. 11-27Article in journal (Refereed) Published
    National Category
    Computer Engineering
    Identifiers
    urn:nbn:se:uu:diva-340202 (URN)10.1016/j.parco.2018.02.003 (DOI)000433655700002 ()
    Projects
    UPMARCResource Sharing Modeling
    Funder
    Swedish Research Council, FFL12-0051Swedish Foundation for Strategic Research , FFL12-0051
    Available from: 2018-02-22 Created: 2018-01-26 Last updated: 2018-11-16Bibliographically approved
    4. Behind the Scenes: Memory Analysis of Graphical Workloads on Tile-based GPUs
    Open this publication in new window or tab >>Behind the Scenes: Memory Analysis of Graphical Workloads on Tile-based GPUs
    2018 (English)In: Proc. International Symposium on Performance Analysis of Systems and Software: ISPASS 2018, IEEE Computer Society, 2018, p. 1-11Conference paper, Published paper (Refereed)
    Place, publisher, year, edition, pages
    IEEE Computer Society, 2018
    National Category
    Computer Systems
    Identifiers
    urn:nbn:se:uu:diva-361214 (URN)10.1109/ISPASS.2018.00009 (DOI)000545984300001 ()978-1-5386-5010-3 (ISBN)
    Conference
    ISPASS 2018, April 2–4, Belfast, UK
    Projects
    UPMARC
    Available from: 2018-09-21 Created: 2018-09-21 Last updated: 2021-06-02Bibliographically approved
    5. Tail-PASS: Resource-based Cache Management for Tiled Graphics Rendering Hardware
    Open this publication in new window or tab >>Tail-PASS: Resource-based Cache Management for Tiled Graphics Rendering Hardware
    2018 (English)In: Proc. 16th International Conference on Parallel and Distributed Processing with Applications, IEEE, 2018, p. 55-63Conference paper, Published paper (Refereed)
    Abstract [en]

    Modern graphics rendering is a very expensive process and can account for 60% of the battery consumption on current games. Much of the cost comes from the high memory bandwidth of rendering complex graphics. To render a frame, multiple smaller rendering passes called scenes are executed, with each one tiled for parallel execution. The data for each scene comes from hundreds of software resources (textures). We observe that each frame can consume up to 1000s of MB of data, but that over 75% of the graphics memory accesses are to the top-10 resources, and that bypassing the remaining infrequently accessed (tail) resources reduces cache pollution. Bypassing the tail can save up to 35% of the main memory traffic over resource-oblivious replacement policies and cache management techniques. In this paper, we propose Tail-PASS, a cache management technique that detects the most accessed resources at runtime, learns if it is worth bypassing the least accessed ones, and then dynamically enables/disables bypassing to reduce cache pollution on a per-scene basis. Overall, we see an average reduction in bandwidth-per-frame of 22% (up to 46%) by bypassing all but the top-10 resources and an 11% (up to 44%) reduction if only the top-2 resources are cached.

    Place, publisher, year, edition, pages
    IEEE, 2018
    National Category
    Computer Systems Computer Sciences
    Identifiers
    urn:nbn:se:uu:diva-363920 (URN)10.1109/BDCloud.2018.00022 (DOI)000467843200008 ()978-1-7281-1141-4 (ISBN)
    Conference
    ISPA 2018, December 11–13, Melbourne, Australia
    Funder
    EU, European Research Council, 715283
    Available from: 2018-10-21 Created: 2018-10-21 Last updated: 2019-06-17Bibliographically approved
    Download full text (pdf)
    fulltext
    Download (jpg)
    presentationsbild
1234567 1 - 50 of 345
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf