uu.seUppsala University Publications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Partitioning GPUs for Improved Scalability
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems. (UART)
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication. (UART)
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication. (UART)
2016 (English)In: Proc. 28th International Symposium on Computer Architecture and High Performance Computing, IEEE Computer Society, 2016, 42-49 p.Conference paper, Published paper (Refereed)
Abstract [en]

To port applications to GPUs, developers need to express computational tasks as highly parallel executions with tens of thousands of threads to fill the GPU's compute resources. However, while this will fill the GPU's resources, it does not necessarily deliver the best efficiency, as the task may scale poorly when run with sufficient parallelism to fill the GPU. In this work we investigate how we can improve throughput by co-scheduling poorly-scaling tasks on sub-partitions of the GPU to increase utilization efficiency. We first investigate the scalability of typical HPC tasks on GPUs, and then use this insight to improve throughput by extending the StarPU framework to co-schedule tasks on the GPU. We demonstrate that co-scheduling poorly-scaling GPU tasks accelerates the execution of the critical tasks of a Cholesky Factorization and improves the overall performance of the application by 9% across a wide range of block sizes.

Place, publisher, year, edition, pages
IEEE Computer Society, 2016. 42-49 p.
Series
International Symposium on Computer Architecture and High Performance Computing
National Category
Computer Engineering
Identifiers
URN: urn:nbn:se:uu:diva-305626DOI: 10.1109/SBAC-PAD.2016.14ISI: 000391392400006ISBN: 9781509061082 (print)OAI: oai:DiVA.org:uu-305626DiVA: diva2:1038751
Conference
SBAC-PAD 2016, October 26–28, Los Angeles, CA
Projects
UPMARC
Funder
Swedish Foundation for Strategic Research , FFL12-0051
Available from: 2016-10-19 Created: 2016-10-19 Last updated: 2017-02-09Bibliographically approved

Open Access in DiVA

No full text

Other links

Publisher's full text

Search in DiVA

By author/editor
Janzén, JohanBlack-Schaffer, DavidHugo, Andra
By organisation
Computer Architecture and Computer CommunicationComputer Systems
Computer Engineering

Search outside of DiVA

GoogleGoogle Scholar

Altmetric score

Total: 809 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf