uu.seUppsala University Publications
Change search
Refine search result
1 - 17 of 17
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1.
    Eklöv, David
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    A Profiling Method for Analyzing Scalability Bottlenecks on Multicores2012Report (Other academic)
  • 2.
    Eklöv, David
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Computer Systems. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Efficient methods for application performance analysis2011Licentiate thesis, comprehensive summary (Other academic)
    Abstract [en]

    To reduce latency and increase bandwidth to memory, modern microprocessors are designed with deep memory hierarchies including several levels of caches. For such microprocessors, the service time for fetching data from off-chip memory is about two orders of magnitude longer than fetching data from the level-one cache. Consequently, the performance of applications is largely determined by how well they utilize the caches in the memory hierarchy, captured by their miss ratio curves. However, efficiently obtaining an application's miss ratio curve and interpreting its performance implications is hard. This task becomes even more challenging when analyzing application performance on multicore processors where several applications/threads share caches and memory bandwidths. To accomplish this, we need powerful techniques that capture applications' cache utilization and provide intuitive performance metrics.

    In this thesis we present three techniques for analyzing application performance, StatStack, StatCC and Cache Pirating. Our main focus is on providing memory hierarchy related performance metrics such as miss ratio, fetch ratio and bandwidth demand, but also execution rate. These techniques are based on profiling information, requiring both runtime data collection and post processing. For such techniques to be broadly applicable the data collection has to have minimal impact on the profiled application, allow profiling of unmodified binaries, and not depend on custom hardware and/or operating system extensions. Furthermore, the information provided has to be accurate and easy to interpret by programmers, the runtime environment and compilers.

    StatStack estimates an application's miss ratio curve, StatCC estimates the miss ratio of co-running application sharing the last-level cache and Cache Pirating measures any desired performance metric available through hardware performance counters as a function of cache size. We have experimentally shown that our methods are both efficient and accurate. The runtime information required by StatStack and StatCC can be collected with an average runtime overhead of 40%. The Cache Pirating method measures the desired performance metrics with an average runtime overhead of 5%.

  • 3.
    Eklöv, David
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Computer Systems. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Profiling Methods for Memory Centric Software Performance Analysis2012Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    To reduce latency and increase bandwidth to memory, modern microprocessors are often designed with deep memory hierarchies including several levels of caches. For such microprocessors, both the latency and the bandwidth to off-chip memory are typically about two orders of magnitude worse than the latency and bandwidth to the fastest on-chip cache. Consequently, the performance of many applications is largely determined by how well they utilize the caches and bandwidths in the memory hierarchy. For such applications, there are two principal approaches to improve performance: optimize the memory hierarchy and optimize the software. In both cases, it is important to both qualitatively and quantitatively understand how the software utilizes and interacts with the resources (e.g., cache and bandwidths) in the memory hierarchy.

    This thesis presents several novel profiling methods for memory-centric software performance analysis. The goal of these profiling methods is to provide general, high-level, quantitative information describing how the profiled applications utilize the resources in the memory hierarchy, and thereby help software and hardware developers identify opportunities for memory related hardware and software optimizations. For such techniques to be broadly applicable the data collection should have minimal impact on the profiled application, while not being dependent on custom hardware and/or operating system extensions. Furthermore, the resulting profiling information should be accurate and easy to interpret.

    While several use cases are presented, the main focus of this thesis is the design and evaluation of the core profiling methods. These core profiling methods measure and/or estimate how high-level performance metrics, such as miss-and fetch ratio; off-chip bandwidth demand; and execution rate are affected by the amount of resources the profiled applications receive. This thesis shows that such high-level profiling information can be accurately obtained with very little impact on the profiled applications and without requiring costly simulations or custom hardware support.

    List of papers
    1. StatStack: Efficient modeling of LRU caches
    Open this publication in new window or tab >>StatStack: Efficient modeling of LRU caches
    2010 (English)In: Proc. International Symposium on Performance Analysis of Systems and Software: ISPASS 2010, Piscataway, NJ: IEEE , 2010, p. 55-65Conference paper, Published paper (Refereed)
    Place, publisher, year, edition, pages
    Piscataway, NJ: IEEE, 2010
    National Category
    Computer Sciences
    Identifiers
    urn:nbn:se:uu:diva-136247 (URN)10.1109/ISPASS.2010.5452069 (DOI)978-1-4244-6023-6 (ISBN)
    Projects
    Coder-mpUPMARC
    Available from: 2010-04-19 Created: 2010-12-10 Last updated: 2018-01-12Bibliographically approved
    2. Fast modeling of shared caches in multicore systems
    Open this publication in new window or tab >>Fast modeling of shared caches in multicore systems
    2011 (English)In: Proc. 6th International Conference on High Performance and Embedded Architectures and Compilers, New York: ACM Press , 2011, p. 147-157Conference paper, Published paper (Refereed)
    Place, publisher, year, edition, pages
    New York: ACM Press, 2011
    National Category
    Computer Sciences
    Identifiers
    urn:nbn:se:uu:diva-146757 (URN)10.1145/1944862.1944885 (DOI)978-1-4503-0241-8 (ISBN)
    Projects
    Coder-mpUPMARC
    Available from: 2011-02-20 Created: 2011-02-20 Last updated: 2018-01-12Bibliographically approved
    3. Cache Pirating: Measuring the Curse of the Shared Cache
    Open this publication in new window or tab >>Cache Pirating: Measuring the Curse of the Shared Cache
    2011 (English)In: Proc. 40th International Conference on Parallel Processing, IEEE Computer Society, 2011, p. 165-175Conference paper, Published paper (Refereed)
    Place, publisher, year, edition, pages
    IEEE Computer Society, 2011
    National Category
    Computer Engineering
    Identifiers
    urn:nbn:se:uu:diva-181254 (URN)10.1109/ICPP.2011.15 (DOI)978-1-4577-1336-1 (ISBN)
    Conference
    ICPP 2011
    Projects
    UPMARCCoDeR-MP
    Available from: 2011-10-17 Created: 2012-09-20 Last updated: 2018-12-14Bibliographically approved
    4. Quantitative Characterization of Memory Contention
    Open this publication in new window or tab >>Quantitative Characterization of Memory Contention
    2012 (English)Report (Other academic)
    Abstract [en]

    On multicore processors, co-executing applications compete for shared resources, such as cache capacity and memory bandwidth. This leads to suboptimal resource allocation and can cause substantial performance loss, which makes it important to effectively manage these shared resources. This, however, requires insights into how the applications are impacted by such resource sharing.

    While there are several methods to analyze the performance impact of cache contention, less attention has been paid to general, quantitative methods for analyzing the impact of contention for memory bandwidth. To this end we introduce the Bandwidth Bandit, a general, quantitative, profiling method for analyzing the performance impact of contention for memory bandwidth on multicore machines.

    The profiling data captured by the Bandwidth Bandit is presented in a it bandwidth graph. This graph accurately captures the measured application's performance as a function of its available memory bandwidth, and enables us to determine how much the application suffers when its available bandwidth is reduced. To demonstrate the value of this data, we present a case study in which we use the bandwidth graph to analyze the performance impact of memory contention when co-running multiple instances of single threaded application.

    Place, publisher, year, edition, pages
    Uppsala: Uppsala universitet, 2012. p. 10
    Series
    Technical report / Department of Information Technology, Uppsala University, ISSN 1404-3203 ; 2012-029
    National Category
    Computer Systems
    Research subject
    Computer Systems Sciences
    Identifiers
    urn:nbn:se:uu:diva-182445 (URN)
    Available from: 2013-03-28 Created: 2012-10-10 Last updated: 2013-03-28Bibliographically approved
    5. A Profiling Method for Analyzing Scalability Bottlenecks on Multicores
    Open this publication in new window or tab >>A Profiling Method for Analyzing Scalability Bottlenecks on Multicores
    2012 (English)Report (Other academic)
    Publisher
    p. 12
    National Category
    Computer Systems
    Identifiers
    urn:nbn:se:uu:diva-182453 (URN)
    Available from: 2012-10-10 Created: 2012-10-10 Last updated: 2018-06-28Bibliographically approved
  • 4.
    Eklöv, David
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Black-Schaffer, David
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Hagersten, Erik
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Fast modeling of shared caches in multicore systems2011In: Proc. 6th International Conference on High Performance and Embedded Architectures and Compilers, New York: ACM Press , 2011, p. 147-157Conference paper (Refereed)
  • 5.
    Eklöv, David
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Black-Schaffer, David
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Hagersten, Erik
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    StatCC: a statistical cache contention model2010In: Proc. 19th International Conference on Parallel Architectures and Compilation Techniques, New York: ACM Press , 2010, p. 551-552Conference paper (Refereed)
  • 6.
    Eklöv, David
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Hagersten, Erik
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    StatStack: Efficient modeling of LRU caches2010In: Proc. International Symposium on Performance Analysis of Systems and Software: ISPASS 2010, Piscataway, NJ: IEEE , 2010, p. 55-65Conference paper (Refereed)
  • 7.
    Eklöv, David
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Nikoleris, Nikkos
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Black-Schaffer, David
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Hägersten, Erik
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Bandwidth bandit: Quantitative characterization of memory contention2012In: Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT, 2012, p. 457-458Conference paper (Refereed)
    Abstract [en]

    Applications that are co-scheduled on a multi-core compete for shared resources, such as cache capacity and memory bandwidth. The performance degradation resulting from this contention can be substantial, which makes it important to effectively manage these shared resources. This, however, requires quantitative insight into how applications are impacted by such contention. In this paper we present a quantitative method to measure applications' sensitivities to different degrees of contention for off-chip memory bandwidth on real hardware. We then use the data captured with our profiling method to estimate the throughput of a set of co-running instances of a single threaded application.

  • 8.
    Eklöv, David
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Nikoleris, Nikos
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Black-Schaffer, David
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Hagersten, Erik
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Bandwidth Bandit: Quantitative Characterization of Memory Contention2013In: Proc. 11th International Symposium on Code Generation and Optimization: CGO 2013, IEEE Computer Society, 2013, p. 99-108Conference paper (Refereed)
    Abstract [en]

    On multicore processors, co-executing applications compete for shared resources, such as cache capacity and memory bandwidth. This leads to suboptimal resource allocation and can cause substantial performance loss, which makes it important to effectively manage these shared resources. This, however, requires insights into how the applications are impacted by such resource sharing. While there are several methods to analyze the performance impact of cache contention, less attention has been paid to general, quantitative methods for analyzing the impact of contention for memory bandwidth. To this end we introduce the Bandwidth Bandit, a general, quantitative, profiling method for analyzing the performance impact of contention for memory bandwidth on multicore machines. The profiling data captured by the Bandwidth Bandit is presented in a bandwidth graph. This graph accurately captures the measured application's performance as a function of its available memory bandwidth, and enables us to determine how much the application suffers when its available bandwidth is reduced. To demonstrate the value of this data, we present a case study in which we use the bandwidth graph to analyze the performance impact of memory contention when co-running multiple instances of single threaded application.

  • 9.
    Eklöv, David
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Nikoleris, Nikos
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Black-Schaffer, David
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Hagersten, Erik
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Cache Pirating: Measuring the curse of the shared cache2011Report (Other academic)
  • 10.
    Eklöv, David
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Nikoleris, Nikos
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Black-Schaffer, David
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Hagersten, Erik
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Cache Pirating: Measuring the Curse of the Shared Cache2011In: Proc. 40th International Conference on Parallel Processing, IEEE Computer Society, 2011, p. 165-175Conference paper (Refereed)
  • 11.
    Eklöv, David
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Nikoleris, Nikos
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Black-Schaffer, David
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Hägersten, Erik
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Quantitative Characterization of Memory Contention2012Report (Other academic)
    Abstract [en]

    On multicore processors, co-executing applications compete for shared resources, such as cache capacity and memory bandwidth. This leads to suboptimal resource allocation and can cause substantial performance loss, which makes it important to effectively manage these shared resources. This, however, requires insights into how the applications are impacted by such resource sharing.

    While there are several methods to analyze the performance impact of cache contention, less attention has been paid to general, quantitative methods for analyzing the impact of contention for memory bandwidth. To this end we introduce the Bandwidth Bandit, a general, quantitative, profiling method for analyzing the performance impact of contention for memory bandwidth on multicore machines.

    The profiling data captured by the Bandwidth Bandit is presented in a it bandwidth graph. This graph accurately captures the measured application's performance as a function of its available memory bandwidth, and enables us to determine how much the application suffers when its available bandwidth is reduced. To demonstrate the value of this data, we present a case study in which we use the bandwidth graph to analyze the performance impact of memory contention when co-running multiple instances of single threaded application.

  • 12.
    Eklöv, David
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Nikoleris, Nikos
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Hagersten, Erik
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    A software based profiling method for obtaining speedup stacks on commodity multi-cores2014In: 2014 IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE (ISPASS): ISPASS 2014, IEEE Computer Society, 2014, p. 148-157Conference paper (Refereed)
    Abstract [en]

    A key goodness metric of multi-threaded programs is how their execution times scale when increasing the number of threads. However, there are several bottlenecks that can limit the scalability of a multi-threaded program, e.g., contention for shared cache capacity and off-chip memory bandwidth; and synchronization overheads. In order to improve the scalability of a multi-threaded program, it is vital to be able to quantify how the program is impacted by these scalability bottlenecks. We present a software profiling method for obtaining speedup stacks. A speedup stack reports how much each scalability bottleneck limits the scalability of a multi-threaded program. It thereby quantifies how much its scalability can be improved by eliminating a given bottleneck. A software developer can use this information to determine what optimizations are most likely to improve scalability, while a computer architect can use it to analyze the resource demands of emerging workloads. The proposed method profiles the program on real commodity multi-cores (i.e., no simulations required) using existing performance counters. Consequently, the obtained speedup stacks accurately account for all idiosyncrasies of the machine on which the program is profiled. While the main contribution of this paper is the profiling method to obtain speedup stacks, we present several examples of how speedup stacks can be used to analyze the resource requirements of multi-threaded programs. Furthermore, we discuss how their scalability can be improved by both software developers and computer architects.

  • 13.
    Hagersten, Erik
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Eklöv, David
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Black-Schaffer, David
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Efficient cache modeling with sparse data2010In: Processor and System-on-Chip Simulation, New York: Springer , 2010, p. 193-209Chapter in book (Refereed)
  • 14.
    Nikoleris, Nikos
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Eklöv, David
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Hagersten, Erik
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Extending statistical cache models to support detailed pipeline simulators2014In: 2014 IEEE International Symposium On Performance Analysis Of Systems And Software (Ispass), IEEE Computer Society, 2014, p. 86-95Conference paper (Refereed)
    Abstract [en]

    Simulators are widely used in computer architecture research. While detailed cycle-accurate simulations provide useful insights, studies using modern workloads typically require days or weeks. Evaluating many design points, only exacerbates the simulation overhead. Recent works propose methods with good accuracy that reduce the simulated overhead either by sampling the execution (e.g., SMARTS and SimPoint) or by using fast analytical models of the simulated designs (e.g., Interval Simulation). While these techniques reduce significantly the simulation overhead, modeling processor components with large state, such as the last-level cache, requires costly simulation to warm them up. Statistical simulation methods, such as SMARTS, report that the warm-up overhead accounts for 99% of the simulation overhead, while only 1% of the time is spent simulating the target design. This paper proposes WarmSim, a method that eliminates the need to warm up the cache. WarmSim builds on top of a statistical cache modeling technique and extends it to model accurately not only the miss ratio but also the outcome of every cache request. WarmSim uses as input, an application's memory reuse information which is hardware independent. Therefore, different cache configurations can be simulated using the same input data. We demonstrate that this approach can be used to estimate the CPI of the SPEC CPU2006 benchmarks with an average error of 1.77%, reducing the overhead compared to a simulation with a 10M instruction warm-up by a factor of 50x.

  • 15.
    Sandberg, Andreas
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Eklöv, David
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Hagersten, Erik
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    A Software Technique for Reducing Cache Pollution2010In: Proc. 3rd Swedish Workshop on Multi-Core Computing, Göteborg, Sweden: Chalmers University of Technology , 2010, p. 59-62Conference paper (Other academic)
    Abstract [en]

    Contention for shared cache resources has been recognizedas a major bottleneck for multicores—especially for mixedworkloads of independent applications. While most modernprocessors implement instructions to manage caches, theseinstructions are largely unused due to a lack of understand-ing of how to best leverage them.

    We propose an automatic, low-overhead, method to reducecache contention by finding instructions that are prone tocache trashing and a method to automatically disable cachingfor such instructions. Practical experiments demonstratethat our software-only method can improve application per-formance up to 35% on x86 multicore hardware.

  • 16.
    Sandberg, Andreas
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Eklöv, David
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Hagersten, Erik
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Reducing Cache Pollution Through Detection and Elimination of Non-Temporal Memory Accesses2010In: Proc. International Conference for High Performance Computing, Networking, Storage and Analysis: SC 2010, Piscataway, NJ: IEEE , 2010, p. 11-Conference paper (Refereed)
    Abstract [en]

    Contention for shared cache resources has been recognized as a major bottleneck for multicores—especially for mixed workloads of independent applications. While most modern processors implement instructions to manage caches, these instructions are largely unused due to a lack of understanding of how to best leverage them. This paper introduces a classification of applications into four cache usage categories. We discuss how applications from different categories affect each other's performance indirectly through cache sharing and devise a scheme to optimize such sharing. We also propose a low-overhead method to automatically find the best per-instruction cache management policy. We demonstrate how the indirect cache-sharing effects of mixed workloads can be tamed by automatically altering some instructions to better manage cache resources. Practical experiments demonstrate that our software-only method can improve application performance up to 35% on x86 multicore hardware.

  • 17.
    Sembrant, Andreas
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Eklöv, David
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Hagersten, Erik
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Efficient software-based online phase classification2011In: International Symposium on Workload Characterization (IISWC'11), IEEE Computer Society, 2011, p. 104-115Conference paper (Refereed)
    Abstract [en]

    Many programs exhibit execution phases with time-varying behavior. Phase detection has been used extensively to find short and representative simulation points, used to quickly get representative simulation results for long-running applications. Several proposals for hardware-assisted phase detection have also been proposed to guide various forms of optimizations and hardware configurations. This paper explores the feasibility of low overhead phase detection at runtime based entirely on existing features found in modern processors. If successful, such a technology would be useful for cache management, frequency adjustments, runtime scheduling and profiling techniques. The paper evaluates several existing and new alternatives for efficient runtime data collection and online phase detection. ScarPhase (Sample-based Classification and Analysis for Runtime Phases), a new online phase detection library, is presented. It makes extensive usage of the new hardware counter features, introduces a new phase classification heuristic and suggests a way to dynamically adjust the sample rate. ScarPhase exhibits runtime overhead below 2%.

1 - 17 of 17
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf