uu.seUppsala universitets publikasjoner
Endre søk
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Profiling Methods for Memory Centric Software Performance Analysis
Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Matematisk-datavetenskapliga sektionen, Institutionen för informationsteknologi, Avdelningen för datorteknik. Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Matematisk-datavetenskapliga sektionen, Institutionen för informationsteknologi, Datorteknik. (UART)
2012 (engelsk)Doktoravhandling, med artikler (Annet vitenskapelig)
Abstract [en]

To reduce latency and increase bandwidth to memory, modern microprocessors are often designed with deep memory hierarchies including several levels of caches. For such microprocessors, both the latency and the bandwidth to off-chip memory are typically about two orders of magnitude worse than the latency and bandwidth to the fastest on-chip cache. Consequently, the performance of many applications is largely determined by how well they utilize the caches and bandwidths in the memory hierarchy. For such applications, there are two principal approaches to improve performance: optimize the memory hierarchy and optimize the software. In both cases, it is important to both qualitatively and quantitatively understand how the software utilizes and interacts with the resources (e.g., cache and bandwidths) in the memory hierarchy.

This thesis presents several novel profiling methods for memory-centric software performance analysis. The goal of these profiling methods is to provide general, high-level, quantitative information describing how the profiled applications utilize the resources in the memory hierarchy, and thereby help software and hardware developers identify opportunities for memory related hardware and software optimizations. For such techniques to be broadly applicable the data collection should have minimal impact on the profiled application, while not being dependent on custom hardware and/or operating system extensions. Furthermore, the resulting profiling information should be accurate and easy to interpret.

While several use cases are presented, the main focus of this thesis is the design and evaluation of the core profiling methods. These core profiling methods measure and/or estimate how high-level performance metrics, such as miss-and fetch ratio; off-chip bandwidth demand; and execution rate are affected by the amount of resources the profiled applications receive. This thesis shows that such high-level profiling information can be accurately obtained with very little impact on the profiled applications and without requiring costly simulations or custom hardware support.

sted, utgiver, år, opplag, sider
Uppsala: Acta Universitatis Upsaliensis, 2012. , s. 51
Serie
Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology, ISSN 1651-6214 ; 1000
HSV kategori
Forskningsprogram
Datavetenskap
Identifikatorer
URN: urn:nbn:se:uu:diva-182594ISBN: 978-91-554-8541-2 (tryckt)OAI: oai:DiVA.org:uu-182594DiVA, id: diva2:560132
Disputas
2012-12-21, Room 2446, Polacksbacken, Lägerhyddsvägen 2, Uppsala, 13:00 (engelsk)
Opponent
Veileder
Prosjekter
UPMARCTilgjengelig fra: 2012-11-29 Laget: 2012-10-11 Sist oppdatert: 2018-01-12bibliografisk kontrollert
Delarbeid
1. StatStack: Efficient modeling of LRU caches
Åpne denne publikasjonen i ny fane eller vindu >>StatStack: Efficient modeling of LRU caches
2010 (engelsk)Inngår i: Proc. International Symposium on Performance Analysis of Systems and Software: ISPASS 2010, Piscataway, NJ: IEEE , 2010, s. 55-65Konferansepaper, Publicerat paper (Fagfellevurdert)
sted, utgiver, år, opplag, sider
Piscataway, NJ: IEEE, 2010
HSV kategori
Identifikatorer
urn:nbn:se:uu:diva-136247 (URN)10.1109/ISPASS.2010.5452069 (DOI)978-1-4244-6023-6 (ISBN)
Prosjekter
Coder-mpUPMARC
Tilgjengelig fra: 2010-04-19 Laget: 2010-12-10 Sist oppdatert: 2018-01-12bibliografisk kontrollert
2. Fast modeling of shared caches in multicore systems
Åpne denne publikasjonen i ny fane eller vindu >>Fast modeling of shared caches in multicore systems
2011 (engelsk)Inngår i: Proc. 6th International Conference on High Performance and Embedded Architectures and Compilers, New York: ACM Press , 2011, s. 147-157Konferansepaper, Publicerat paper (Fagfellevurdert)
sted, utgiver, år, opplag, sider
New York: ACM Press, 2011
HSV kategori
Identifikatorer
urn:nbn:se:uu:diva-146757 (URN)10.1145/1944862.1944885 (DOI)978-1-4503-0241-8 (ISBN)
Prosjekter
Coder-mpUPMARC
Tilgjengelig fra: 2011-02-20 Laget: 2011-02-20 Sist oppdatert: 2018-01-12bibliografisk kontrollert
3. Cache Pirating: Measuring the Curse of the Shared Cache
Åpne denne publikasjonen i ny fane eller vindu >>Cache Pirating: Measuring the Curse of the Shared Cache
2011 (engelsk)Inngår i: Proc. 40th International Conference on Parallel Processing, IEEE Computer Society, 2011, s. 165-175Konferansepaper, Publicerat paper (Fagfellevurdert)
sted, utgiver, år, opplag, sider
IEEE Computer Society, 2011
HSV kategori
Identifikatorer
urn:nbn:se:uu:diva-181254 (URN)10.1109/ICPP.2011.15 (DOI)978-1-4577-1336-1 (ISBN)
Konferanse
ICPP 2011
Prosjekter
UPMARCCoDeR-MP
Tilgjengelig fra: 2011-10-17 Laget: 2012-09-20 Sist oppdatert: 2018-12-14bibliografisk kontrollert
4. Quantitative Characterization of Memory Contention
Åpne denne publikasjonen i ny fane eller vindu >>Quantitative Characterization of Memory Contention
2012 (engelsk)Rapport (Annet vitenskapelig)
Abstract [en]

On multicore processors, co-executing applications compete for shared resources, such as cache capacity and memory bandwidth. This leads to suboptimal resource allocation and can cause substantial performance loss, which makes it important to effectively manage these shared resources. This, however, requires insights into how the applications are impacted by such resource sharing.

While there are several methods to analyze the performance impact of cache contention, less attention has been paid to general, quantitative methods for analyzing the impact of contention for memory bandwidth. To this end we introduce the Bandwidth Bandit, a general, quantitative, profiling method for analyzing the performance impact of contention for memory bandwidth on multicore machines.

The profiling data captured by the Bandwidth Bandit is presented in a it bandwidth graph. This graph accurately captures the measured application's performance as a function of its available memory bandwidth, and enables us to determine how much the application suffers when its available bandwidth is reduced. To demonstrate the value of this data, we present a case study in which we use the bandwidth graph to analyze the performance impact of memory contention when co-running multiple instances of single threaded application.

sted, utgiver, år, opplag, sider
Uppsala: Uppsala universitet, 2012. s. 10
Serie
Technical report / Department of Information Technology, Uppsala University, ISSN 1404-3203 ; 2012-029
HSV kategori
Forskningsprogram
Data- och systemvetenskap
Identifikatorer
urn:nbn:se:uu:diva-182445 (URN)
Tilgjengelig fra: 2013-03-28 Laget: 2012-10-10 Sist oppdatert: 2013-03-28bibliografisk kontrollert
5. A Profiling Method for Analyzing Scalability Bottlenecks on Multicores
Åpne denne publikasjonen i ny fane eller vindu >>A Profiling Method for Analyzing Scalability Bottlenecks on Multicores
2012 (engelsk)Rapport (Annet vitenskapelig)
Publisher
s. 12
HSV kategori
Identifikatorer
urn:nbn:se:uu:diva-182453 (URN)
Tilgjengelig fra: 2012-10-10 Laget: 2012-10-10 Sist oppdatert: 2018-06-28bibliografisk kontrollert

Open Access i DiVA

fulltext(2116 kB)631 nedlastinger
Filinformasjon
Fil FULLTEXT01.pdfFilstørrelse 2116 kBChecksum SHA-512
b1ab3a63cd757f97238cb55a2a6f8d3e8ca26ebc909bbf2daa2a4d91ac5b67f28a68b0368e37aa46c75232276c61a498b2579e92d469a9bbaf9cdcec35bfca98
Type fulltextMimetype application/pdf
Kjøp publikasjonen >>

Personposter BETA

Eklöv, David

Søk i DiVA

Av forfatter/redaktør
Eklöv, David
Av organisasjonen

Søk utenfor DiVA

GoogleGoogle Scholar
Totalt: 631 nedlastinger
Antall nedlastinger er summen av alle nedlastinger av alle fulltekster. Det kan for eksempel være tidligere versjoner som er ikke lenger tilgjengelige

isbn
urn-nbn

Altmetric

isbn
urn-nbn
Totalt: 1126 treff
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf