Logo: to the web site of Uppsala University

uu.sePublications from Uppsala University
Change search
Link to record
Permanent link

Direct link
Publications (10 of 16) Show all publications
Grass, T., Carlson, T. E., Rico, A., Ceballos, G., Ayguade, E., Casas, M. & Moreto, M. (2019). Sampled Simulation of Task-Based Programs. IEEE Transactions on Computers, 68(2), 255-269
Open this publication in new window or tab >>Sampled Simulation of Task-Based Programs
Show others...
2019 (English)In: IEEE Transactions on Computers, ISSN 0018-9340, E-ISSN 1557-9956, Vol. 68, no 2, p. 255-269Article in journal (Refereed) Published
Abstract [en]

Sampled simulation is a mature technique for reducing simulation time of single-threaded programs. Nevertheless, current sampling techniques do not take advantage of other execution models, like task-based execution, to provide both more accurate and faster simulation. Recent multi-threaded sampling techniques assume that the workload assigned to each thread does not change across multiple executions of a program. This assumption does not hold for dynamically scheduled task-based programming models. Task-based programming models allow the programmer to specify program segments as tasks which are instantiated many times and scheduled dynamically to available threads. Due to variation in scheduling decisions, two consecutive executions on the same machine typically result in different instruction streams processed by each thread. In this paper, we propose TaskPoint, a sampled simulation technique for dynamically scheduled task-based programs. We leverage task instances as sampling units and simulate only a fraction of all task instances in detail. Between detailed simulation intervals, we employ a novel fast-forwarding mechanism for dynamically scheduled programs. We evaluate different automatic techniques for clustering task instances and show that DBSCAN clustering combined with analytical performance modeling provides the best trade-off of simulation speed and accuracy. TaskPoint is the first technique combining sampled simulation and analytical modeling and provides a new way to trade off simulation speed and accuracy. Compared to detailed simulation, TaskPoint accelerates architectural simulation with 8 simulated threads by an average factor of 220x at an average error of 0.5 percent and a maximum error of 7.9 percent.

Place, publisher, year, edition, pages
IEEE COMPUTER SOC, 2019
Keywords
Sampled simulation, task-based, analytical performance modeling
National Category
Computer Sciences
Identifiers
urn:nbn:se:uu:diva-376810 (URN)10.1109/TC.2018.2860012 (DOI)000456176200008 ()
Funder
EU, Horizon 2020, 687698EU, European Research Council, GA 321253EU, FP7, Seventh Framework Programme, 2013BP B 00243
Available from: 2019-02-22 Created: 2019-02-22 Last updated: 2023-03-28Bibliographically approved
Ceballos, G., Grass, T., Hugo, A. & Black-Schaffer, D. (2018). Analyzing performance variation of task schedulers with TaskInsight. Parallel Computing, 75, 11-27
Open this publication in new window or tab >>Analyzing performance variation of task schedulers with TaskInsight
2018 (English)In: Parallel Computing, ISSN 0167-8191, E-ISSN 1872-7336, Vol. 75, p. 11-27Article in journal (Refereed) Published
National Category
Computer Engineering
Identifiers
urn:nbn:se:uu:diva-340202 (URN)10.1016/j.parco.2018.02.003 (DOI)000433655700002 ()
Projects
UPMARCResource Sharing Modeling
Funder
Swedish Research Council, FFL12-0051Swedish Foundation for Strategic Research , FFL12-0051
Available from: 2018-02-22 Created: 2018-01-26 Last updated: 2018-11-16Bibliographically approved
Ceballos, G., Sembrant, A., Carlson, T. E. & Black-Schaffer, D. (2018). Behind the Scenes: Memory Analysis of Graphical Workloads on Tile-based GPUs. In: Proc. International Symposium on Performance Analysis of Systems and Software: ISPASS 2018. Paper presented at ISPASS 2018, April 2–4, Belfast, UK (pp. 1-11). IEEE Computer Society
Open this publication in new window or tab >>Behind the Scenes: Memory Analysis of Graphical Workloads on Tile-based GPUs
2018 (English)In: Proc. International Symposium on Performance Analysis of Systems and Software: ISPASS 2018, IEEE Computer Society, 2018, p. 1-11Conference paper, Published paper (Refereed)
Place, publisher, year, edition, pages
IEEE Computer Society, 2018
National Category
Computer Systems
Identifiers
urn:nbn:se:uu:diva-361214 (URN)10.1109/ISPASS.2018.00009 (DOI)000545984300001 ()978-1-5386-5010-3 (ISBN)
Conference
ISPASS 2018, April 2–4, Belfast, UK
Projects
UPMARC
Available from: 2018-09-21 Created: 2018-09-21 Last updated: 2021-06-02Bibliographically approved
Ceballos, G., Hagersten, E. & Black-Schaffer, D. (2018). Tail-PASS: Resource-based Cache Management for Tiled Graphics Rendering Hardware. In: Proc. 16th International Conference on Parallel and Distributed Processing with Applications: . Paper presented at ISPA 2018, December 11–13, Melbourne, Australia (pp. 55-63). IEEE
Open this publication in new window or tab >>Tail-PASS: Resource-based Cache Management for Tiled Graphics Rendering Hardware
2018 (English)In: Proc. 16th International Conference on Parallel and Distributed Processing with Applications, IEEE, 2018, p. 55-63Conference paper, Published paper (Refereed)
Abstract [en]

Modern graphics rendering is a very expensive process and can account for 60% of the battery consumption on current games. Much of the cost comes from the high memory bandwidth of rendering complex graphics. To render a frame, multiple smaller rendering passes called scenes are executed, with each one tiled for parallel execution. The data for each scene comes from hundreds of software resources (textures). We observe that each frame can consume up to 1000s of MB of data, but that over 75% of the graphics memory accesses are to the top-10 resources, and that bypassing the remaining infrequently accessed (tail) resources reduces cache pollution. Bypassing the tail can save up to 35% of the main memory traffic over resource-oblivious replacement policies and cache management techniques. In this paper, we propose Tail-PASS, a cache management technique that detects the most accessed resources at runtime, learns if it is worth bypassing the least accessed ones, and then dynamically enables/disables bypassing to reduce cache pollution on a per-scene basis. Overall, we see an average reduction in bandwidth-per-frame of 22% (up to 46%) by bypassing all but the top-10 resources and an 11% (up to 44%) reduction if only the top-2 resources are cached.

Place, publisher, year, edition, pages
IEEE, 2018
National Category
Computer Systems Computer Sciences
Identifiers
urn:nbn:se:uu:diva-363920 (URN)10.1109/BDCloud.2018.00022 (DOI)000467843200008 ()978-1-7281-1141-4 (ISBN)
Conference
ISPA 2018, December 11–13, Melbourne, Australia
Funder
EU, European Research Council, 715283
Available from: 2018-10-21 Created: 2018-10-21 Last updated: 2019-06-17Bibliographically approved
Ceballos, G. (2018). Understanding Task Parallelism: Providing insight into scheduling, memory, and performance for CPUs and Graphics. (Doctoral dissertation). Uppsala: Acta Universitatis Upsaliensis
Open this publication in new window or tab >>Understanding Task Parallelism: Providing insight into scheduling, memory, and performance for CPUs and Graphics
2018 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Maximizing the performance of computer systems while making them more energy efficient is vital for future developments in engineering, medicine, entertainment, etc. However, the increasing complexity of software, hardware, and their interactions makes this task difficult. Software developers have to deal with complex memory architectures such as multilevel caches on modern CPUs and keeping thousands of cores busy in GPUs, which makes the programming process harder.

Task-based programming provides high-level abstractions to simplify the development process. In this model, independent tasks (functions) are submitted to a runtime system, which orchestrates their execution across hardware resources. This approach has become popular and successful because the runtime can distribute the workload across hardware resources automatically, and has the potential to optimize the execution to minimize data movement (e.g., being aware of the cache hierarchy).

However, to build better runtime systems, we now need to understand bottlenecks in the performance of current and future multicore architectures. Unfortunately, since most current work was designed for sequential or thread-based workloads, there is an overall lack of tools and methods to gain insight about the execution of these applications, allowing both the runtime and the programmers to detect potential optimizations.

In this thesis, we address this lack of tools by providing fast, accurate and mathematically-sound models to understand the execution of task-based applications. In particular, we center these models around three key aspects of the execution: memory behavior (data locality), scheduling, and performance. Our contributions provide insight into the interplay between the schedule's behavior, data reuse through the cache hierarchy, and the resulting performance. These contributions lay the groundwork for improving runtime systems. We first apply these methods to analyze a diverse set of CPU applications, and then leverage them to one of the most common workloads in current systems: graphics rendering on GPUs.

Place, publisher, year, edition, pages
Uppsala: Acta Universitatis Upsaliensis, 2018. p. 67
Series
Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology, ISSN 1651-6214 ; 1737
Keywords
Task-based programming, Task Scheduling, Analytical Cache Model, Scheduling, Runtime Systems, Computer Graphics (rendering)
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:uu:diva-363924 (URN)978-91-513-0485-4 (ISBN)
Public defence
2018-12-04, 2446, ITC, Lägerhyddsvägen 2, Uppsala, 09:15 (English)
Opponent
Supervisors
Projects
UPMARC
Available from: 2018-11-15 Created: 2018-10-21 Last updated: 2019-02-25Bibliographically approved
Ceballos, G., Sembrant, A., Carlson, T. E. & Black-Schaffer, D. (2017). Analyzing Graphics Workloads on Tile-based GPUs. In: Proc. 20th International Symposium on Workload Characterization: . Paper presented at IISWC 2017, October 1–3, Seattle, WA (pp. 108-109). IEEE
Open this publication in new window or tab >>Analyzing Graphics Workloads on Tile-based GPUs
2017 (English)In: Proc. 20th International Symposium on Workload Characterization, IEEE, 2017, p. 108-109Conference paper, Published paper (Refereed)
Place, publisher, year, edition, pages
IEEE, 2017
National Category
Computer Systems Computer Engineering
Identifiers
urn:nbn:se:uu:diva-335559 (URN)10.1109/IISWC.2017.8167761 (DOI)000428206700011 ()978-1-5386-1233-0 (ISBN)
Conference
IISWC 2017, October 1–3, Seattle, WA
Projects
UPMARC
Funder
Swedish Foundation for Strategic Research , FFL12-0051
Available from: 2017-12-06 Created: 2017-12-06 Last updated: 2018-11-15Bibliographically approved
Ceballos, G., Hugo, A., Hagersten, E. & Black-Schaffer, D. (2017). Exploring scheduling effects on task performance with TaskInsight. Supercomputing frontiers and innovations, 4(3), 91-98
Open this publication in new window or tab >>Exploring scheduling effects on task performance with TaskInsight
2017 (English)In: Supercomputing frontiers and innovations, ISSN 2214-3270, E-ISSN 2313-8734, Vol. 4, no 3, p. 91-98Article in journal (Refereed) Published
National Category
Computer Systems
Identifiers
urn:nbn:se:uu:diva-335528 (URN)10.14529/jsfi170306 (DOI)
Projects
UPMARC
Funder
Swedish Foundation for Strategic Research , FFL12-0051
Available from: 2017-12-06 Created: 2017-12-06 Last updated: 2018-11-16Bibliographically approved
Ceballos, G. (2017). How to make tasks faster: Revealing the complex interactions of tasks in the memory system. In: Proc. Companion 8th ACM International Conference on Systems, Programming, Languages, and Applications: Software for Humanity. Paper presented at SPLASH 2017, October 22–27, Vancouver, Canada (pp. 1-3). New York: ACM Press
Open this publication in new window or tab >>How to make tasks faster: Revealing the complex interactions of tasks in the memory system
2017 (English)In: Proc. Companion 8th ACM International Conference on Systems, Programming, Languages, and Applications: Software for Humanity, New York: ACM Press, 2017, p. 1-3Conference paper, Published paper (Refereed)
Place, publisher, year, edition, pages
New York: ACM Press, 2017
National Category
Computer Systems
Identifiers
urn:nbn:se:uu:diva-335557 (URN)10.1145/3135932.3135933 (DOI)978-1-4503-5514-8 (ISBN)
Conference
SPLASH 2017, October 22–27, Vancouver, Canada
Projects
UPMARC
Funder
Swedish Foundation for Strategic Research , FFL12-0051
Available from: 2017-10-22 Created: 2017-12-06 Last updated: 2018-11-16Bibliographically approved
Ceballos, G. (2017). Modeling the interactions between tasks and the memory system. (Licentiate dissertation). Uppsala University
Open this publication in new window or tab >>Modeling the interactions between tasks and the memory system
2017 (English)Licentiate thesis, comprehensive summary (Other academic)
Abstract [en]

Making computer systems more energy efficient while obtaining the maximum performance possible is key for future developments in engineering, medicine, entertainment, etc. However it has become a difficult task due to the increasing complexity of hardware and software, and their interactions. For example, developers have to deal with deep, multi-level cache hierarchies on modern CPUs, and keep busy thousands of cores in GPUs, which makes the programming process more difficult.

To simplify this task, new abstractions and programming models are becoming popular. Their goal is to make applications more scalable and efficient, while still providing the flexibility and portability of old, widely adopted models. One example of this is task-based programming, where simple independent tasks (functions) are delegated to a runtime system which orchestrates their execution. This approach has been successful because the runtime can automatically distribute work across hardware cores and has the potential to minimize data movement and placement (e.g., being aware of the cache hierarchy).

To build better runtime systems, it is crucial to understand bottlenecks in the performance of current and future multicore systems. In this thesis, we provide fast, accurate and mathematically-sound models and techniques to understand the execution of task-based applications concerning three key aspects: memory behavior (data locality), scheduling, and performance. With these methods, we lay the groundwork for improving runtime system, providing insight into the interplay between the schedule's behavior, data reuse through the cache hierarchy, and the resulting performance.

Place, publisher, year, edition, pages
Uppsala University, 2017
Series
Information technology licentiate theses: Licentiate theses from the Department of Information Technology, ISSN 1404-5117 ; 2017-002
National Category
Computer Systems
Research subject
Computer Science
Identifiers
urn:nbn:se:uu:diva-335530 (URN)
Supervisors
Projects
UPMARC
Available from: 2017-10-01 Created: 2017-12-06 Last updated: 2018-11-16Bibliographically approved
Ceballos, G., Grass, T., Hugo, A. & Black-Schaffer, D. (2017). TaskInsight: Understanding task schedules effects on memory and performance. In: Proc. 8th International Workshop on Programming Models and Applications for Multicores and Manycores: . Paper presented at PMAM 2017, February 4–8, Austin, TX (pp. 11-20). New York: ACM Press
Open this publication in new window or tab >>TaskInsight: Understanding task schedules effects on memory and performance
2017 (English)In: Proc. 8th International Workshop on Programming Models and Applications for Multicores and Manycores, New York: ACM Press, 2017, p. 11-20Conference paper, Published paper (Refereed)
Place, publisher, year, edition, pages
New York: ACM Press, 2017
National Category
Computer Engineering
Identifiers
urn:nbn:se:uu:diva-315033 (URN)10.1145/3026937.3026943 (DOI)978-1-4503-4883-6 (ISBN)
Conference
PMAM 2017, February 4–8, Austin, TX
Projects
UPMARCResource Sharing Modeling
Funder
Swedish Research CouncilSwedish Foundation for Strategic Research , FFL12-0051EU, Horizon 2020, 687698
Available from: 2017-02-04 Created: 2017-02-08 Last updated: 2018-11-16Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0003-2314-7307

Search in DiVA

Show all publications