uu.seUppsala University Publications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Resource conscious prefetching for irregular applications in multicores
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems. (UART)
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems. (UART)
2014 (English)In: Proc. International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS XIV), Piscataway, NJ: IEEE , 2014, 34-43 p.Conference paper, Published paper (Refereed)
Place, publisher, year, edition, pages
Piscataway, NJ: IEEE , 2014. 34-43 p.
National Category
Computer Science
Identifiers
URN: urn:nbn:se:uu:diva-234545DOI: 10.1109/SAMOS.2014.6893192ISI: 000361019300005ISBN: 978-1-4799-3770-7 (print)OAI: oai:DiVA.org:uu-234545DiVA: diva2:757000
Conference
SAMOS XIV, July 14–17, 2014
Projects
UPMARC
Available from: 2014-07-17 Created: 2014-10-20 Last updated: 2016-03-09Bibliographically approved
In thesis
1. Optimizing Performance in Highly Utilized Multicores with Intelligent Prefetching
Open this publication in new window or tab >>Optimizing Performance in Highly Utilized Multicores with Intelligent Prefetching
2016 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Modern processors apply sophisticated techniques, such as deep cache hierarchies and hardware prefetching, to increase performance. Such complex hardware structures have helped improve performance in general, however, their full potential is not realized as software often utilizes the memory hierarchy inefficiently. Performance can be improved further by ensuring careful interaction between software and hardware. Performance can typically improve by increasing the cache utilization and by conserving the DRAM bandwidth, i.e., retaining more useful data in the caches and lowering data requests to the DRAM. One way to achieve this is to conserve space across the cache hierarchy and increase opportunity for temporal reuse of cached data. Similarly, conserving the DRAM bandwidth is essential for performance in highly utilized multicores, as it can easily become a critical resource. When multiple cores are active and the per-core share of DRAM bandwidth shrinks, its efficient utilization plays an important role in improving the overall performance. Together the cache hierarchy and the DRAM bandwidth play a significant role in defining the overall performance in multicores.

Based on deep insight from memory behavior modeling of software, this thesis explores five software-only methods to analyze and increase performance in multicores. The underlying philosophy that drives these techniques is to increase cache utilization and conserve DRAM bandwidth by 1) focusing on making data prefetching more accurate, and 2) lowering the miss rate in the cache hierarchy either by preserving useful data longer by cache-bypassing the less useful data or via code size compaction using compiler options. First, we show how microarchitecture-independent memory access profiles can be used to analyze the Instruction Cache performance of software. We use this information in a compiler pass to recompile application phases (with large Instruction cache miss rate) for smaller code size in an effort to improve the application Instruction Cache behavior. Second, we demonstrate how a resourceefficient software prefetching method can be combined with hardware prefetching to improve performance in multicores when running software that exhibits irregular memory access patterns. Third, we show that hardware prefetching on high performance commodity multicores is sub-optimal and demonstrate how a resource-efficient software-only prefetching method can perform better in fully utilized multicores. Fourth, we present an adaptive prefetching approach that dynamically combines software and hardware prefetching in a runtime system to improve performance in highly utilized multicores. Finally, in the fifth work we develop a method to predict per-core prefetching configurations that deliver near-optimal overall multicore performance. These software techniques enable us to tap greater performance in multicores (up to 50%), without requiring more processing resources.

Place, publisher, year, edition, pages
Uppsala: Acta Universitatis Upsaliensis, 2016. 54 p.
Series
Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology, ISSN 1651-6214 ; 1335
Keyword
Performance, Optimization, Prefetching, multicore, memory hierarchy
National Category
Computer Science
Research subject
Computer Science
Identifiers
urn:nbn:se:uu:diva-272095 (URN)978-91-554-9450-6 (ISBN)
Public defence
2016-03-21, ITC/2446, Informationsteknologiskt centrum, Lägerhyddsvägen 2, Uppsala, 13:00 (English)
Opponent
Supervisors
Available from: 2016-02-25 Created: 2016-01-11 Last updated: 2016-04-18Bibliographically approved

Open Access in DiVA

No full text

Other links

Publisher's full text

Authority records BETA

Khan, MuneebHagersten, Erik

Search in DiVA

By author/editor
Khan, MuneebHagersten, Erik
By organisation
Computer Systems
Computer Science

Search outside of DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 565 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf