uu.seUppsala University Publications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Efficient thread/page/parallelism autotuning for NUMA systems
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computing Science. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
2019 (English)In: International Conference on Supercomputing / [ed] ACM, New York, NY, USA: Association for Computing Machinery (ACM), 2019, , p. 12Conference paper, Published paper (Refereed)
Abstract [en]

Current multi-socket systems have complex memory hierarchies with significant Non-Uniform Memory Access (NUMA) effects: memory performance depends on the location of the data and the thread. This complexity means that thread- and data-mappings have a significant impact on performance. However, it is hard to find efficient data mappings and thread configurations due to the complex interactions between applications and systems.

In this paper we explore the combined search space of thread mappings, data mappings, number of NUMA nodes, and degreeof-parallelism, per application phase, and across multiple systems. We show that there are significant performance benefits from optimizing this wide range of parameters together. However, such an optimization presents two challenges: accurately modeling the performance impact of configurations across applications and systems, and exploring the vast space of configurations. To overcome the modeling challenge, we use native execution of small, representative codelets, which reproduce the system and application interactions. To make the search practical, we build a search space by combining a range of state of the art thread- and data-mapping policies.

Combining these two approaches results in a tractable search space that can be quickly and accurately evaluated without sacrificing significant performance. This search finds non-intuitive configurations that perform significantly better than previous works. With this approach we are able to achieve an average speedup of 1.97× on a four node NUMA system

Place, publisher, year, edition, pages
New York, NY, USA: Association for Computing Machinery (ACM), 2019. , p. 12
Keywords [en]
NUMA, autotunning, thread placement, page placement, code isolation, OpenMP, performance optimization
National Category
Computer Systems
Research subject
Computer Science
Identifiers
URN: urn:nbn:se:uu:diva-396173DOI: 10.1145/3330345.3330376OAI: oai:DiVA.org:uu-396173DiVA, id: diva2:1366744
Conference
Proceedings of the ACM International Conference on Supercomputing, Phoenix, AZ, USA, June 26–28, 2019 (ICS ’19)
Funder
Swedish Foundation for Strategic Research , FFL12-0051 and RIT15-0012Available from: 2019-10-30 Created: 2019-10-30 Last updated: 2019-11-11Bibliographically approved

Open Access in DiVA

fulltext(2797 kB)28 downloads
File information
File name FULLTEXT01.pdfFile size 2797 kBChecksum SHA-512
f6150e9dc5d9a55e00c7a5357be2516ac2317437bc3611d8744c99fd9634d217bff3d0686970768e59ffa10ff2f3c6767fb414fe8b8dd55ab1437653c40db1fe
Type fulltextMimetype application/pdf

Other links

Publisher's full texthttps://dl.acm.org/citation.cfm?id=3330376

Authority records BETA

Popov, MihailJimborean, AlexandraBlack-Schaffer, David

Search in DiVA

By author/editor
Popov, MihailJimborean, AlexandraBlack-Schaffer, David
By organisation
Computer Architecture and Computer CommunicationComputing ScienceComputer Systems
Computer Systems

Search outside of DiVA

GoogleGoogle Scholar
Total: 28 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 161 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf