uu.seUppsala University Publications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Bandwidth Bandit: Quantitative Characterization of Memory Contention
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems. (UART)
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems. (UART)
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems. (UART)
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems. (UART)
2013 (English)In: Proc. 11th International Symposium on Code Generation and Optimization: CGO 2013, IEEE Computer Society, 2013, p. 99-108Conference paper, Published paper (Refereed)
Abstract [en]

On multicore processors, co-executing applications compete for shared resources, such as cache capacity and memory bandwidth. This leads to suboptimal resource allocation and can cause substantial performance loss, which makes it important to effectively manage these shared resources. This, however, requires insights into how the applications are impacted by such resource sharing. While there are several methods to analyze the performance impact of cache contention, less attention has been paid to general, quantitative methods for analyzing the impact of contention for memory bandwidth. To this end we introduce the Bandwidth Bandit, a general, quantitative, profiling method for analyzing the performance impact of contention for memory bandwidth on multicore machines. The profiling data captured by the Bandwidth Bandit is presented in a bandwidth graph. This graph accurately captures the measured application's performance as a function of its available memory bandwidth, and enables us to determine how much the application suffers when its available bandwidth is reduced. To demonstrate the value of this data, we present a case study in which we use the bandwidth graph to analyze the performance impact of memory contention when co-running multiple instances of single threaded application.

Place, publisher, year, edition, pages
IEEE Computer Society, 2013. p. 99-108
Keywords [en]
bandwidth, memory, caches
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
URN: urn:nbn:se:uu:diva-194101DOI: 10.1109/CGO.2013.6494987ISI: 000318700200010ISBN: 978-1-4673-5524-7 (print)OAI: oai:DiVA.org:uu-194101DiVA, id: diva2:616760
Conference
CGO 2013, 23-27 February, Shenzhen, China
Projects
UPMARC
Funder
Swedish Research CouncilAvailable from: 2013-04-18 Created: 2013-02-08 Last updated: 2018-12-14Bibliographically approved
In thesis
1. Efficient Memory Modeling During Simulation and Native Execution
Open this publication in new window or tab >>Efficient Memory Modeling During Simulation and Native Execution
2019 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Application performance on computer processors depends on a number of complex architectural and microarchitectural design decisions. Consequently, computer architects rely on performance modeling to improve future processors without building prototypes. This thesis focuses on performance modeling and proposes methods that quantify the impact of the memory system on application performance.

Detailed architectural simulation, a common approach to performance modeling, can be five orders of magnitude slower than execution on the actual processor. At this rate, simulating realistic workloads requires years of CPU time. Prior research uses sampling to speed up simulation. Using sampled simulation, only a number of small but representative portions of the workload are evaluated in detail. To fully exploit the speed potential of sampled simulation, the simulation method has to efficiently reconstruct the architectural and microarchitectural state prior to the simulation samples. Practical approaches to sampled simulation use either functional simulation at the expense of performance or checkpoints at the expense of flexibility. This thesis proposes three approaches that use statistical cache modeling to efficiently address the problem of cache warm up and speed up sampled simulation, without compromising flexibility. The statistical cache model uses sparse memory reuse information obtained with native techniques to model the performance of the cache. The proposed sampled simulation framework evaluates workloads 150 times faster than approaches that use functional simulation to warm up the cache.

Other approaches to performance modeling use analytical models based on data obtained from execution on native hardware. These native techniques allow for better understanding of the performance bottlenecks on existing hardware. Efficient resource utilization in modern multicore processors is necessary to exploit their peak performance. This thesis proposes native methods that characterize shared resource utilization in modern multicores. These methods quantify the impact of cache sharing and off-chip memory sharing on overall application performance. Additionally, they can quantify scalability bottlenecks for data-parallel, symmetric workloads.

Place, publisher, year, edition, pages
Uppsala: Acta Universitatis Upsaliensis, 2019. p. 73
Series
Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology, ISSN 1651-6214 ; 1756
Keywords
performance analysis, cache performance, multicore performance, memory system, memory bandwidth, memory contention, performance prediction, multi-threading, multiprocessing systems, program diagnostics, commodity multicores, multithreaded program resource requirements, performance counters, scalability bottleneck, scalability improvement
National Category
Computer Systems
Research subject
Computer Science
Identifiers
urn:nbn:se:uu:diva-369490 (URN)978-91-513-0538-7 (ISBN)
Public defence
2019-02-15, Sal VIII, Universitetshuset, Biskopsgatan 3, Uppsala, 09:15 (English)
Opponent
Supervisors
Projects
UPMARC
Available from: 2019-01-23 Created: 2018-12-14 Last updated: 2019-02-25

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full text

Authority records BETA

Eklöv, DavidNikoleris, NikosBlack-Schaffer, DavidHagersten, Erik

Search in DiVA

By author/editor
Eklöv, DavidNikoleris, NikosBlack-Schaffer, DavidHagersten, Erik
By organisation
Computer Systems
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 10458 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf