Partitioning GPUs for Improved Scalability
2016 (English)In: Proc. 28th International Symposium on Computer Architecture and High Performance Computing, IEEE Computer Society, 2016, 42-49 p.Conference paper (Refereed)
To port applications to GPUs, developers need to express computational tasks as highly parallel executions with tens of thousands of threads to fill the GPU's compute resources. However, while this will fill the GPU's resources, it does not necessarily deliver the best efficiency, as the task may scale poorly when run with sufficient parallelism to fill the GPU. In this work we investigate how we can improve throughput by co-scheduling poorly-scaling tasks on sub-partitions of the GPU to increase utilization efficiency. We first investigate the scalability of typical HPC tasks on GPUs, and then use this insight to improve throughput by extending the StarPU framework to co-schedule tasks on the GPU. We demonstrate that co-scheduling poorly-scaling GPU tasks accelerates the execution of the critical tasks of a Cholesky Factorization and improves the overall performance of the application by 9% across a wide range of block sizes.
Place, publisher, year, edition, pages
IEEE Computer Society, 2016. 42-49 p.
International Symposium on Computer Architecture and High Performance Computing
IdentifiersURN: urn:nbn:se:uu:diva-305626DOI: 10.1109/SBAC-PAD.2016.14ISI: 000391392400006ISBN: 9781509061082 (print)OAI: oai:DiVA.org:uu-305626DiVA: diva2:1038751
SBAC-PAD 2016, October 26–28, Los Angeles, CA
FunderSwedish Foundation for Strategic Research , FFL12-0051