uu.seUppsala University Publications
Change search
Refine search result
1 - 6 of 6
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1.
    Carlson, Trevor E.
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Tran, Kim-Anh
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Jimborean, Alexandra
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Koukos, Konstantinos
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Själander, Magnus
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Kaxiras, Stefanos
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Transcending hardware limits with software out-of-order processing2017In: IEEE Computer Architecture Letters, ISSN 1556-6056, Vol. 16, no 2, p. 162-165Article in journal (Refereed)
  • 2.
    Tran, Kim-Anh
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Software Out-of-Order Execution for In-Order Architectures2016In: Proc. 25th International Conference on Parallel Architectures and Compilation Techniques, New York: ACM Press, 2016, p. 458-458Conference paper (Refereed)
  • 3.
    Tran, Kim-Anh
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Computer Systems. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Static instruction scheduling for high performance on energy-efficient processors2018Licentiate thesis, comprehensive summary (Other academic)
    Abstract [en]

    New trends such as the internet-of-things and smart homes push the demands for energy-efficiency. Choosing energy-efficient hardware, however, often comes as a trade-off to high-performance. In order to strike a good balance between the two, we propose software solutions to tackle the performance bottlenecks of small and energy-efficient processors.

    One of the main performance bottlenecks of processors is the discrepancy between processor and memory speed, known as the memory wall. While the processor executes instructions at a high pace, the memory is too slow to provide data in a timely manner, if data has not been cached in advance. Load instructions that require an access to memory are thereby referred to as long-latency or delinquent loads. Long latencies caused by delinquent loads are putting a strain on small processors, which have few or no resources to effectively hide the latencies. As a result, the processor may stall.

    In this thesis we propose compile-time transformation techniques to mitigate the penalties of delinquent loads on small out-of-order processors, with the ultimate goal to avoid processor stalls as much as possible. Our code transformation is applicable for general-purpose code, including unknown memory dependencies, complex control flow and pointers. We further propose a software-hardware co-design that combines the code transformation technique with lightweight hardware support to hide latencies on a stall-on-use in-order processor.

    List of papers
    1. Clairvoyance: Look-ahead compile-time scheduling
    Open this publication in new window or tab >>Clairvoyance: Look-ahead compile-time scheduling
    Show others...
    2017 (English)In: Proc. 15th International Symposium on Code Generation and Optimization, Piscataway, NJ: IEEE Press, 2017, p. 171-184Conference paper, Published paper (Refereed)
    Place, publisher, year, edition, pages
    Piscataway, NJ: IEEE Press, 2017
    National Category
    Computer Sciences
    Identifiers
    urn:nbn:se:uu:diva-316480 (URN)000402548700015 ()978-1-5090-4931-8 (ISBN)
    Conference
    CGO 2017, February 4–8, Austin, TX
    Projects
    UPMARC
    Funder
    Swedish Research Council, 2010-4741
    Available from: 2017-02-04 Created: 2017-03-01 Last updated: 2018-04-26Bibliographically approved
    2. Static instruction scheduling for high performance on limited hardware
    Open this publication in new window or tab >>Static instruction scheduling for high performance on limited hardware
    Show others...
    2018 (English)In: IEEE Transactions on Computers, ISSN 0018-9340, Vol. 67, no 4, p. 513-527Article in journal (Refereed) Published
    Abstract [en]

    Complex out-of-order (OoO) processors have been designed to overcome the restrictions of outstanding long-latency misses at the cost of increased energy consumption. Simple, limited OoO processors are a compromise in terms of energy consumption and performance, as they have fewer hardware resources to tolerate the penalties of long-latency loads. In worst case, these loads may stall the processor entirely. We present Clairvoyance, a compiler based technique that generates code able to hide memory latency and better utilize simple OoO processors. By clustering loads found across basic block boundaries, Clairvoyance overlaps the outstanding latencies to increases memory-level parallelism. We show that these simple OoO processors, equipped with the appropriate compiler support, can effectively hide long-latency loads and achieve performance improvements for memory-bound applications. To this end, Clairvoyance tackles (i) statically unknown dependencies, (ii) insufficient independent instructions, and (iii) register pressure. Clairvoyance achieves a geomean execution time improvement of 14 percent for memory-bound applications, on top of standard O3 optimizations, while maintaining compute-bound applications' high-performance.

    National Category
    Computer Sciences
    Identifiers
    urn:nbn:se:uu:diva-334011 (URN)10.1109/TC.2017.2769641 (DOI)000427420800005 ()
    Projects
    UPMARC
    Funder
    Swedish Research Council, 2016-05086
    Available from: 2017-11-03 Created: 2017-11-20 Last updated: 2018-05-17Bibliographically approved
    3. Software Out-of-Order Execution for In-Order Architectures
    Open this publication in new window or tab >>Software Out-of-Order Execution for In-Order Architectures
    2016 (English)In: Proc. 25th International Conference on Parallel Architectures and Compilation Techniques, New York: ACM Press, 2016, p. 458-458Conference paper, Poster (with or without abstract) (Refereed)
    Place, publisher, year, edition, pages
    New York: ACM Press, 2016
    National Category
    Computer Sciences
    Identifiers
    urn:nbn:se:uu:diva-309768 (URN)10.1145/2967938.2971466 (DOI)000392249100055 ()978-1-4503-4121-9 (ISBN)
    Conference
    PACT 2016, September 11–15, Haifa, Israel
    Projects
    UPMARC
    Available from: 2016-09-11 Created: 2016-12-07 Last updated: 2018-04-26Bibliographically approved
    4. Transcending hardware limits with software out-of-order processing
    Open this publication in new window or tab >>Transcending hardware limits with software out-of-order processing
    Show others...
    2017 (English)In: IEEE Computer Architecture Letters, ISSN 1556-6056, Vol. 16, no 2, p. 162-165Article in journal (Refereed) Published
    National Category
    Computer Systems
    Identifiers
    urn:nbn:se:uu:diva-334012 (URN)10.1109/LCA.2017.2672559 (DOI)000418870500018 ()
    Projects
    UPMARC
    Available from: 2017-02-22 Created: 2017-11-20 Last updated: 2018-04-26Bibliographically approved
  • 4.
    Tran, Kim-Anh
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Carlson, Trevor E.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Koukos, Konstantinos
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Själander, Magnus
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Spiliopoulos, Vasileios
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Kaxiras, Stefanos
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Jimborean, Alexandra
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Clairvoyance: Look-ahead compile-time scheduling2017In: Proc. 15th International Symposium on Code Generation and Optimization, Piscataway, NJ: IEEE Press, 2017, p. 171-184Conference paper (Refereed)
  • 5.
    Tran, Kim-Anh
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Carlson, Trevor E.
    Koukos, Konstantinos
    Själander, Magnus
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication. Norwegian Univ Sci & Technol, S-75236 Uppsala, Sweden.
    Spiliopoulos, Vasileios
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Kaxiras, Stefanos
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Jimborean, Alexandra
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Static instruction scheduling for high performance on limited hardware2018In: IEEE Transactions on Computers, ISSN 0018-9340, Vol. 67, no 4, p. 513-527Article in journal (Refereed)
    Abstract [en]

    Complex out-of-order (OoO) processors have been designed to overcome the restrictions of outstanding long-latency misses at the cost of increased energy consumption. Simple, limited OoO processors are a compromise in terms of energy consumption and performance, as they have fewer hardware resources to tolerate the penalties of long-latency loads. In worst case, these loads may stall the processor entirely. We present Clairvoyance, a compiler based technique that generates code able to hide memory latency and better utilize simple OoO processors. By clustering loads found across basic block boundaries, Clairvoyance overlaps the outstanding latencies to increases memory-level parallelism. We show that these simple OoO processors, equipped with the appropriate compiler support, can effectively hide long-latency loads and achieve performance improvements for memory-bound applications. To this end, Clairvoyance tackles (i) statically unknown dependencies, (ii) insufficient independent instructions, and (iii) register pressure. Clairvoyance achieves a geomean execution time improvement of 14 percent for memory-bound applications, on top of standard O3 optimizations, while maintaining compute-bound applications' high-performance.

  • 6.
    Tran, Kim-Anh
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Jimborean, Alexandra
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Carlson, Trevor E.
    National University of Singapore, Singapore.
    Koukos, Konstantinos
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Själander, Magnus
    NTNU, Norway.
    Kaxiras, Stefanos
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    SWOOP: software-hardware co-design for non-speculative, execute-ahead, in-order cores2018In: Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, Association for Computing Machinery (ACM), 2018, p. 328-343Conference paper (Refereed)
    Abstract [en]

    Increasing demands for energy efficiency constrain emerging hardware. These new hardware trends challenge the established assumptions in code generation and force us to rethink existing software optimization techniques. We propose a cross-layer redesign of the way compilers and the underlying microarchitecture are built and interact, to achieve both performance and high energy efficiency.

    In this paper, we address one of the main performance bottlenecks — last-level cache misses — through a software-hardware co-design. Our approach is able to hide memory latency and attain increased memory and instruction level parallelism by orchestrating a non-speculative, execute-ahead paradigm in software (SWOOP). While out-of-order (OoO) architectures attempt to hide memory latency by dynamically reordering instructions, they do so through expensive, power-hungry, speculative mechanisms.We aim to shift this complexity into software, and we build upon compilation techniques inherited from VLIW, software pipelining, modulo scheduling, decoupled access-execution, and software prefetching. In contrast to previous approaches we do not rely on either software or hardware speculation that can be detrimental to efficiency. Our SWOOP compiler is enhanced with lightweight architectural support, thus being able to transform applications that include highly complex control-flow and indirect memory accesses.

1 - 6 of 6
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf