uu.seUppsala University Publications
Change search
ReferencesLink to record
Permanent link

Direct link
AREP: Adaptive Resource Efficient Prefetching for Maximizing Multicore Performance
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication. (UART)
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication. (UART)
Show others and affiliations
2015 (English)In: Proc. 24th International Conference on Parallel Architectures and Compilation Techniques, IEEE Computer Society, 2015, 367-378 p.Conference paper (Refereed)
Place, publisher, year, edition, pages
IEEE Computer Society, 2015. 367-378 p.
National Category
Computer Science
URN: urn:nbn:se:uu:diva-265614DOI: 10.1109/PACT.2015.35ISI: 000378942700031ISBN: 978-1-4673-9524-3OAI: oai:DiVA.org:uu-265614DiVA: diva2:866335
PACT 2015, October 18–21, San Francisco, CA
Available from: 2015-11-02 Created: 2015-11-02 Last updated: 2016-08-10Bibliographically approved
In thesis
1. Optimizing Performance in Highly Utilized Multicores with Intelligent Prefetching
Open this publication in new window or tab >>Optimizing Performance in Highly Utilized Multicores with Intelligent Prefetching
2016 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Modern processors apply sophisticated techniques, such as deep cache hierarchies and hardware prefetching, to increase performance. Such complex hardware structures have helped improve performance in general, however, their full potential is not realized as software often utilizes the memory hierarchy inefficiently. Performance can be improved further by ensuring careful interaction between software and hardware. Performance can typically improve by increasing the cache utilization and by conserving the DRAM bandwidth, i.e., retaining more useful data in the caches and lowering data requests to the DRAM. One way to achieve this is to conserve space across the cache hierarchy and increase opportunity for temporal reuse of cached data. Similarly, conserving the DRAM bandwidth is essential for performance in highly utilized multicores, as it can easily become a critical resource. When multiple cores are active and the per-core share of DRAM bandwidth shrinks, its efficient utilization plays an important role in improving the overall performance. Together the cache hierarchy and the DRAM bandwidth play a significant role in defining the overall performance in multicores.

Based on deep insight from memory behavior modeling of software, this thesis explores five software-only methods to analyze and increase performance in multicores. The underlying philosophy that drives these techniques is to increase cache utilization and conserve DRAM bandwidth by 1) focusing on making data prefetching more accurate, and 2) lowering the miss rate in the cache hierarchy either by preserving useful data longer by cache-bypassing the less useful data or via code size compaction using compiler options. First, we show how microarchitecture-independent memory access profiles can be used to analyze the Instruction Cache performance of software. We use this information in a compiler pass to recompile application phases (with large Instruction cache miss rate) for smaller code size in an effort to improve the application Instruction Cache behavior. Second, we demonstrate how a resourceefficient software prefetching method can be combined with hardware prefetching to improve performance in multicores when running software that exhibits irregular memory access patterns. Third, we show that hardware prefetching on high performance commodity multicores is sub-optimal and demonstrate how a resource-efficient software-only prefetching method can perform better in fully utilized multicores. Fourth, we present an adaptive prefetching approach that dynamically combines software and hardware prefetching in a runtime system to improve performance in highly utilized multicores. Finally, in the fifth work we develop a method to predict per-core prefetching configurations that deliver near-optimal overall multicore performance. These software techniques enable us to tap greater performance in multicores (up to 50%), without requiring more processing resources.

Place, publisher, year, edition, pages
Uppsala: Acta Universitatis Upsaliensis, 2016. 54 p.
Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology, ISSN 1651-6214 ; 1335
Performance, Optimization, Prefetching, multicore, memory hierarchy
National Category
Computer Science
Research subject
Computer Science
urn:nbn:se:uu:diva-272095 (URN)978-91-554-9450-6 (ISBN)
Public defence
2016-03-21, ITC/2446, Informationsteknologiskt centrum, Lägerhyddsvägen 2, Uppsala, 13:00 (English)
Available from: 2016-02-25 Created: 2016-01-11 Last updated: 2016-04-18Bibliographically approved

Open Access in DiVA

fulltext(477 kB)56 downloads
File information
File name FULLTEXT02.pdfFile size 477 kBChecksum SHA-512
Type fulltextMimetype application/pdf

Other links

Publisher's full text

Search in DiVA

By author/editor
Khan, MuneebHagersten, ErikBlack-Schaffer, David
By organisation
Computer Architecture and Computer Communication
Computer Science

Search outside of DiVA

GoogleGoogle Scholar
Total: 56 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Altmetric score

Total: 535 hits
ReferencesLink to record
Permanent link

Direct link