uu.seUppsala University Publications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Low-Overhead Memory Access Sampler: An Efficient Method for Data-Locality Profiling
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology.
2011 (English)Independent thesis Advanced level (professional degree), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

There is an ever widening performance gap between processors and main memory, a gap bridged by small intermediate memories, cache memories, storing recently referenced data. A miss in the cache is an expensive operation because it requires data to be fetched from main memory. It is therefore crucial to understand application cache behavior. Caches only work well for applications with good data locality; insufficient data locality leads to poor cache utilization which quickly becomes a major performance bottleneck. Analysing and understanding the cache behavior helps in improving data locality and identifying such bottlenecks.

In this thesis, we study a method for efficiently analysing application cache behavior. We implement the method in a cache analysis tool. The method uses a statistical cache model that only requires a sparse data locality fingerprint as input. The input is based on reuse distances between cache lines. By adjusting architecture-specific parameters, such as cache line size, the tool can output working-set graphs for a wide range of architectures. Readily available hardware performance counters combined with intelligent sampling are used to enable an implementation with low overhead.

We evaluate our cache analysis tool using the SPEC CPU2006 benchmarks and our results show good accuracy and performance. The difference between the cache miss ratio estimated by our tool and a reference tool was nearly always below one percentage point. The run-time overhead was on average 17%. We also do an analysis of the overhead to identify the components of our implementation that are most costly and should be the focus for optimizations.

We propose a number of optimizations that could reduce the overhead further. Phase-guided sampling is proposed as a key optimization where application phase behavior is used to determine when to sample memory references. We also build a prototype implementation of this optimization and the preliminary results were promising.

Place, publisher, year, edition, pages
2011.
Series
UPTEC IT, ISSN 1401-5749 ; 11 003
Identifiers
URN: urn:nbn:se:uu:diva-146664OAI: oai:DiVA.org:uu-146664DiVA, id: diva2:398696
Uppsok
Technology
Supervisors
Examiners
Available from: 2011-02-18 Created: 2011-02-18 Last updated: 2011-02-18Bibliographically approved

Open Access in DiVA

fulltext(782 kB)501 downloads
File information
File name FULLTEXT01.pdfFile size 782 kBChecksum SHA-512
1e01b7f93b52c148245a4691e26c66296dd15df661a29d2c2382d7cb5e8f3b45d66b1da875ab2f4391127861609e9876bb41b00c6b7f16b91c80de90a2ca94f3
Type fulltextMimetype application/pdf

By organisation
Department of Information Technology

Search outside of DiVA

GoogleGoogle Scholar
Total: 501 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 618 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf