uu.seUppsala University Publications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Tail-PASS: Resource-based Cache Management for Tiled Graphics Rendering Hardware
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication. (Uppsala Architecture Research Team)ORCID iD: 0000-0003-2314-7307
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication. (Uppsala Architecture Research Team)
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication. (Uppsala Architecture Research Team)
2018 (English)In: Proc. 16th International Conference on Parallel and Distributed Processing with Applications, IEEE, 2018, p. 55-63Conference paper, Published paper (Refereed)
Abstract [en]

Modern graphics rendering is a very expensive process and can account for 60% of the battery consumption on current games. Much of the cost comes from the high memory bandwidth of rendering complex graphics. To render a frame, multiple smaller rendering passes called scenes are executed, with each one tiled for parallel execution. The data for each scene comes from hundreds of software resources (textures). We observe that each frame can consume up to 1000s of MB of data, but that over 75% of the graphics memory accesses are to the top-10 resources, and that bypassing the remaining infrequently accessed (tail) resources reduces cache pollution. Bypassing the tail can save up to 35% of the main memory traffic over resource-oblivious replacement policies and cache management techniques. In this paper, we propose Tail-PASS, a cache management technique that detects the most accessed resources at runtime, learns if it is worth bypassing the least accessed ones, and then dynamically enables/disables bypassing to reduce cache pollution on a per-scene basis. Overall, we see an average reduction in bandwidth-per-frame of 22% (up to 46%) by bypassing all but the top-10 resources and an 11% (up to 44%) reduction if only the top-2 resources are cached.

Place, publisher, year, edition, pages
IEEE, 2018. p. 55-63
National Category
Computer Systems Computer Sciences
Identifiers
URN: urn:nbn:se:uu:diva-363920DOI: 10.1109/BDCloud.2018.00022ISI: 000467843200008ISBN: 978-1-7281-1141-4 (electronic)OAI: oai:DiVA.org:uu-363920DiVA, id: diva2:1257490
Conference
ISPA 2018, December 11–13, Melbourne, Australia
Funder
EU, European Research Council, 715283Available from: 2018-10-21 Created: 2018-10-21 Last updated: 2019-06-17Bibliographically approved
In thesis
1. Understanding Task Parallelism: Providing insight into scheduling, memory, and performance for CPUs and Graphics
Open this publication in new window or tab >>Understanding Task Parallelism: Providing insight into scheduling, memory, and performance for CPUs and Graphics
2018 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Maximizing the performance of computer systems while making them more energy efficient is vital for future developments in engineering, medicine, entertainment, etc. However, the increasing complexity of software, hardware, and their interactions makes this task difficult. Software developers have to deal with complex memory architectures such as multilevel caches on modern CPUs and keeping thousands of cores busy in GPUs, which makes the programming process harder.

Task-based programming provides high-level abstractions to simplify the development process. In this model, independent tasks (functions) are submitted to a runtime system, which orchestrates their execution across hardware resources. This approach has become popular and successful because the runtime can distribute the workload across hardware resources automatically, and has the potential to optimize the execution to minimize data movement (e.g., being aware of the cache hierarchy).

However, to build better runtime systems, we now need to understand bottlenecks in the performance of current and future multicore architectures. Unfortunately, since most current work was designed for sequential or thread-based workloads, there is an overall lack of tools and methods to gain insight about the execution of these applications, allowing both the runtime and the programmers to detect potential optimizations.

In this thesis, we address this lack of tools by providing fast, accurate and mathematically-sound models to understand the execution of task-based applications. In particular, we center these models around three key aspects of the execution: memory behavior (data locality), scheduling, and performance. Our contributions provide insight into the interplay between the schedule's behavior, data reuse through the cache hierarchy, and the resulting performance. These contributions lay the groundwork for improving runtime systems. We first apply these methods to analyze a diverse set of CPU applications, and then leverage them to one of the most common workloads in current systems: graphics rendering on GPUs.

Place, publisher, year, edition, pages
Uppsala: Acta Universitatis Upsaliensis, 2018. p. 67
Series
Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology, ISSN 1651-6214 ; 1737
Keywords
Task-based programming, Task Scheduling, Analytical Cache Model, Scheduling, Runtime Systems, Computer Graphics (rendering)
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:uu:diva-363924 (URN)978-91-513-0485-4 (ISBN)
Public defence
2018-12-04, 2446, ITC, Lägerhyddsvägen 2, Uppsala, 09:15 (English)
Opponent
Supervisors
Projects
UPMARC
Available from: 2018-11-15 Created: 2018-10-21 Last updated: 2019-02-25Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full text

Authority records BETA

Ceballos, GermánHagersten, ErikBlack-Schaffer, David

Search in DiVA

By author/editor
Ceballos, GermánHagersten, ErikBlack-Schaffer, David
By organisation
Computer Architecture and Computer Communication
Computer SystemsComputer Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 190 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf