Logotyp: till Uppsala universitets webbplats

uu.sePublikationer från Uppsala universitet
Ändra sökning
Länk till posten
Permanent länk

Direktlänk
Black-Schaffer, David, ProfessorORCID iD iconorcid.org/0000-0001-5375-4058
Publikationer (10 of 69) Visa alla publikationer
Nematallah, A., Park, C. H. & Black-Schaffer, D. (2023). Exploring the Latency Sensitivity of Cache Replacement Policies [Letter to the editor]. IEEE Computer Architecture Letters, 22(2), 93-96
Öppna denna publikation i ny flik eller fönster >>Exploring the Latency Sensitivity of Cache Replacement Policies
2023 (Engelska)Ingår i: IEEE Computer Architecture Letters, ISSN 1556-6056, Vol. 22, nr 2, s. 93-96Artikel i tidskrift, Letter (Refereegranskat) Published
Abstract [en]

With DRAM latencies increasing relative to CPU speeds, the performance of caches has become more important. This has led to increasingly sophisticated replacement policies that require complex calculations to update their replacement metadata, which often require multiple cycles. To minimize the negative impact of these metadata updates, architects have focused on policies that incur as little update latency as possible through a combination of reducing the policies’ precision and using parallel hardware. In this work we investigate whether these tradeoffs to reduce cache metadata update latency are needed. Specifically, we look at the performance and energy impact of increasing the latency of cache replacement policy updates. We find that even dramatic increases in replacement policy update latency have very limited effect. This indicates that designers have far more freedom to increase policy complexity and latency than previously assumed.

Ort, förlag, år, upplaga, sidor
Institute of Electrical and Electronics Engineers (IEEE), 2023
Nyckelord
Cache Replacement Policies, Computer Architecture, High-Performance Computing
Nationell ämneskategori
Datavetenskap (datalogi)
Forskningsämne
Datavetenskap
Identifikatorer
urn:nbn:se:uu:diva-508114 (URN)10.1109/lca.2023.3296251 (DOI)001049956900003 ()
Forskningsfinansiär
Knut och Alice Wallenbergs Stiftelse, 2015.0153EU, Horisont 2020, 715283Vetenskapsrådet, 2019-02429
Tillgänglig från: 2023-07-20 Skapad: 2023-07-20 Senast uppdaterad: 2024-04-05Bibliografiskt granskad
Borgström, G., Rohner, C. & Black-Schaffer, D. (2023). Faster FunctionalWarming with Cache Merging. In: PROCEEDINGS OF SYSTEM ENGINEERING FOR CONSTRAINED EMBEDDED SYSTEMS, DRONESE AND RAPIDO 2023: . Paper presented at Conference on Drone Systems Engineering (DroneSE) / Conference on Rapid Simulation and Performance Evaluation - Methods and Tools (RAPIDO) / Workshop on System Engineering for Constrained Embedded Systems / HiPEAC Conference, JAN 16-18, 2023, Toulouse, FRANCE (pp. 39-47). Association for Computing Machinery (ACM)
Öppna denna publikation i ny flik eller fönster >>Faster FunctionalWarming with Cache Merging
2023 (Engelska)Ingår i: PROCEEDINGS OF SYSTEM ENGINEERING FOR CONSTRAINED EMBEDDED SYSTEMS, DRONESE AND RAPIDO 2023, Association for Computing Machinery (ACM), 2023, s. 39-47Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

Smarts-like sampled hardware simulation techniques achieve good accuracy by simulating many small portions of an application in detail. However, while this reduces the simulation time, it results in extensive cache warming times, as each of the many simulation points requires warming the whole memory hierarchy. Adaptive Cache Warming reduces this time by iteratively increasing warming to achieve sufficient accuracy. Unfortunately, each increases requires that the previous warming be redone, nearly doubling the total warming. We address re-warming by developing a technique to merge the cache states from the previous and additional warming iterations. We demonstrate our merging approach on multi-level LRU cache hierarchy and evaluate and address the introduced errors. Our experiments show that Cache Merging delivers an average speedup of 1.44x, 1.84x, and 1.87x for 128kB, 2MB, and 8MB L2 caches, respectively, (vs. a 2x theoretical maximum speedup) with 95-percentile absolute IPC errors of only 0.029, 0.015, and 0.006, respectively. These results demonstrate that Cache Merging yields significantly higher simulation speed with minimal losses.

Ort, förlag, år, upplaga, sidor
Association for Computing Machinery (ACM), 2023
Nyckelord
functional warming, cache warming, cache merging
Nationell ämneskategori
Datavetenskap (datalogi) Datorteknik
Identifikatorer
urn:nbn:se:uu:diva-519733 (URN)10.1145/3579170.3579256 (DOI)001106628800005 ()979-8-4007-0045-3 (ISBN)
Konferens
Conference on Drone Systems Engineering (DroneSE) / Conference on Rapid Simulation and Performance Evaluation - Methods and Tools (RAPIDO) / Workshop on System Engineering for Constrained Embedded Systems / HiPEAC Conference, JAN 16-18, 2023, Toulouse, FRANCE
Tillgänglig från: 2024-01-09 Skapad: 2024-01-09 Senast uppdaterad: 2024-01-09Bibliografiskt granskad
Haddadi, A., Black-Schaffer, D. & Park, C. H. (2023). Large-scale Graph Processing on Commodity Systems: Understanding and Mitigating the Impact of Swapping. In: The International Symposium on Memory Systems (MEMSYS '23): . Paper presented at The International Symposium on Memory Systems (MEMSYS '23), October 2–5, 2023, Alexandria, VA, USA. Association for Computing Machinery (ACM)
Öppna denna publikation i ny flik eller fönster >>Large-scale Graph Processing on Commodity Systems: Understanding and Mitigating the Impact of Swapping
2023 (Engelska)Ingår i: The International Symposium on Memory Systems (MEMSYS '23), Association for Computing Machinery (ACM), 2023Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

Graph workloads are critical in many areas. Unfortunately, graph sizes have been increasing faster than DRAM capacity. As a result, large-scale graph processing necessarily falls back to virtual memory paging, resulting in tremendous performance losses.

In this work we investigate how we can get the best possible performance on commodity systems from graphs that cannot fit into DRAM by understanding, and adjusting, how the virtual memory system and the graph characteristics interact. To do so, we first characterize the graph applications, system, and SSD behavior as a function of how much of the graph fits in DRAM. From this analysis we see that for multiple graph types, the system fails to fully utilize the bandwidth of the SSDs due to a lack of parallel page-in requests.

We use this insight to motivate overcommitting CPU threads for graph processing. This allows us to significantly increase the number of parallel page-in requests for several graph types, and recover much of the performance lost to paging. We show that overcommitting threads generally improves performance for various algorithms and graph types. However, we identify one graph that suffers from overcommitting threads, leading to the recommendation that overcommitting threads is generally good for performance, but there may be certain graph inputs that suffer from overcommitting threads.

Ort, förlag, år, upplaga, sidor
Association for Computing Machinery (ACM), 2023
Nyckelord
graph processing, virtual memory, swapping, SSD, commodity system, thread overcommitting, characterization, operating system
Nationell ämneskategori
Datorsystem
Forskningsämne
Datavetenskap; Data- och systemvetenskap
Identifikatorer
urn:nbn:se:uu:diva-515500 (URN)10.1145/3631882.3631884 (DOI)
Konferens
The International Symposium on Memory Systems (MEMSYS '23), October 2–5, 2023, Alexandria, VA, USA
Forskningsfinansiär
Knut och Alice Wallenbergs Stiftelse, 2015.0153EU, Horisont 2020, 715283Vetenskapsrådet, 2019-02429
Anmärkning

Funder

Electronics and Telecommunications Research Institute (ETRI)

Grant number: 23ZS1300

Tillgänglig från: 2023-11-03 Skapad: 2023-11-03 Senast uppdaterad: 2023-11-08
Hassan, M., Park, C. H. & Black-Schaffer, D. (2023). Protean: Resource-efficient Instruction Prefetching. In: The International Symposium on Memory Systems (MEMSYS '23): . Paper presented at The International Symposium on Memory Systems (MEMSYS '23), October 2–5, 2023, Alexandria, VA, USA. Association for Computing Machinery (ACM)
Öppna denna publikation i ny flik eller fönster >>Protean: Resource-efficient Instruction Prefetching
2023 (Engelska)Ingår i: The International Symposium on Memory Systems (MEMSYS '23), Association for Computing Machinery (ACM), 2023Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

Increases in code footprint and control flow complexity have made low-latency instruction fetch challenging. Dedicated Instruction Prefetchers (DIPs) can provide performance gains (up to 5%) for a subset of applications that are poorly served by today’s ubiquitous Fetch-Directed Instruction Prefetching (FDIP). However, DIPs incur the significant overhead of in-core metadata storage (for all work- loads) and energy and performance loss from excess prefetches (for many workloads), leading to 11% of workloads actually losing performance. This work addresses how to provide the benefits of a DIP without its costs when the DIP cannot provide a benefit.

Our key insight is that workloads that benefit from DIPs can tolerate increased Branch Target Buffer (BTB) misses. This allows us to dynamically re-purpose the existing BTB storage between the BTB and the DIP. We train a simple performance counter based decision tree to select the optimal configuration at runtime, which allows us to achieve different energy/performance optimization goals. As a result, we pay essentially no area overhead when a DIP is needed, and can use the larger BTB when it is beneficial, or even power it off when not needed.

We look at our impact on two groups of benchmarks: those where the right configuration choice can improve performance or energy and those where the wrong choice could hurt them. For the benchmarks with improvement potential, when optimizing for performance, we are able to obtain 86% of the oracle potential, and when optimizing for energy, 98% of the potential, both while avoid- ing essentially all performance and energy losses on the remaining benchmarks. This demonstrates that our technique is able to dy- namically adapt to different performance/energy goals and obtain essentially all of the potential gains of DIP without the overheads they experience today.

Ort, förlag, år, upplaga, sidor
Association for Computing Machinery (ACM), 2023
Nationell ämneskategori
Datorsystem
Forskningsämne
Datavetenskap; Data- och systemvetenskap
Identifikatorer
urn:nbn:se:uu:diva-515499 (URN)10.1145/3631882.3631904 (DOI)
Konferens
The International Symposium on Memory Systems (MEMSYS '23), October 2–5, 2023, Alexandria, VA, USA
Forskningsfinansiär
Knut och Alice Wallenbergs Stiftelse, 2015.0153EU, Horisont 2020, 715283Vetenskapsrådet, 2019-02429
Anmärkning

Funder

Electronics and Telecommunications Research Institute (ETRI)

Grant number: 23ZS1300

Tillgänglig från: 2023-11-03 Skapad: 2023-11-03 Senast uppdaterad: 2024-04-02
Kumar, R., Alipour, M. & Black-Schaffer, D. (2022). Dependence-aware Slice Execution to Boost MLP in Slice-out-of-order Cores. ACM Transactions on Architecture and Code Optimization (TACO), 19(2), Article ID 25.
Öppna denna publikation i ny flik eller fönster >>Dependence-aware Slice Execution to Boost MLP in Slice-out-of-order Cores
2022 (Engelska)Ingår i: ACM Transactions on Architecture and Code Optimization (TACO), ISSN 1544-3566, E-ISSN 1544-3973, Vol. 19, nr 2, artikel-id 25Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

Exploiting memory-level parallelism (MLP) is crucial to hide long memory and last-level cache access latencies. While out-of-order (OoO) cores, and techniques building on them, are effective at exploiting MLP, they deliver poor energy efficiency due to their complex and energy-hungry hardware. This work revisits slice-out-of-order (sOoO) cores as an energy-efficient alternative for MLP exploitation. sOoO cores achieve energy efficiency by constructing and executing slices of MLP-generating instructions out-of-order only with respect to the rest of instructions; the slices and the remaining instructions, by themselves, execute in-order. However, we observe that existing sOoO cores miss significant MLP opportunities due to their dependence-oblivious in-order slice execution, which causes dependent slices to frequently block MLP generation. To boost MLP generation, we introduce Freeway, a sOoO core based on a new dependence-aware slice execution policy that tracks dependent slices and keeps them from blocking subsequent independent slices and MLP extraction. The proposed core incurs minimal area and power overheads, yet approaches the MLP benefits of fully OoO cores. Our evaluation shows that Freeway delivers 12% better performance than the state-of-the-art sOoO core and is within 7% of the MLP limits of full OoO execution.

Ort, förlag, år, upplaga, sidor
Association for Computing Machinery (ACM)ASSOC COMPUTING MACHINERY, 2022
Nyckelord
Microarchitecture, memory level parallelism, instruction scheduling
Nationell ämneskategori
Datorteknik Datavetenskap (datalogi)
Identifikatorer
urn:nbn:se:uu:diva-473200 (URN)10.1145/3506704 (DOI)000775454600010 ()
Forskningsfinansiär
Knut och Alice Wallenbergs StiftelseEU, Horisont 2020, 715283
Tillgänglig från: 2022-04-26 Skapad: 2022-04-26 Senast uppdaterad: 2024-01-15Bibliografiskt granskad
Park, C. H., Vougioukas, I., Sandberg, A. & Black-Schaffer, D. (2022). Every Walk's a Hit: Making Page Walks Single-Access Cache Hits. In: Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ’22), February 28 – March 4, 2022, Lausanne, Switzerland: . Paper presented at 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ’22), February 28-March 4 2022, Lausanne. Association for Computing Machinery (ACM) Association for Computing Machinery (ACM)
Öppna denna publikation i ny flik eller fönster >>Every Walk's a Hit: Making Page Walks Single-Access Cache Hits
2022 (Engelska)Ingår i: Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ’22), February 28 – March 4, 2022, Lausanne, Switzerland, Association for Computing Machinery (ACM) Association for Computing Machinery (ACM), 2022Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

As memory capacity has outstripped TLB coverage, large data applications suffer from frequent page table walks. We investigate two complementary techniques for addressing this cost: reducing the number of accesses required and reducing the latency of each access. The first approach is accomplished by opportunistically "flattening" the page table: merging two levels of traditional 4 KB page table nodes into a single 2 MB node, thereby reducing the table's depth and the number of indirections required to traverse it. The second is accomplished by biasing the cache replacement algorithm to keep page table entries during periods of high TLB miss rates, as these periods also see high data miss rates and are therefore more likely to benefit from having the smaller page table in the cache than to suffer from increased data cache misses.

We evaluate these approaches for both native and virtualized systems and across a range of realistic memory fragmentation scenarios, describe the limited changes needed in our kernel implementation and hardware design, identify and address challenges related to self-referencing page tables and kernel memory allocation, and compare results across server and mobile systems using both academic and industrial simulators for robustness.

We find that flattening does reduce the number of accesses required on a page walk (to 1.0), but its performance impact (+2.3%) is small due to Page Walker Caches (already 1.5 accesses). Prioritizing caching has a larger effect (+6.8%), and the combination improves performance by +9.2%. Flattening is more effective on virtualized systems (4.4 to 2.8 accesses, +7.1% performance), due to 2D page walks. By combining the two techniques we demonstrate a state-of-the-art +14.0% performance gain and -8.7% dynamic cache energy and -4.7% dynamic DRAM energy for virtualized execution with very simple hardware and software changes.

Ort, förlag, år, upplaga, sidor
Association for Computing Machinery (ACM)Association for Computing Machinery (ACM), 2022
Nyckelord
Flattened page table, page table cache prioritization
Nationell ämneskategori
Datavetenskap (datalogi)
Forskningsämne
Data- och systemvetenskap
Identifikatorer
urn:nbn:se:uu:diva-466738 (URN)10.1145/3503222.3507718 (DOI)000810486300010 ()
Konferens
27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ’22), February 28-March 4 2022, Lausanne
Forskningsfinansiär
Knut och Alice Wallenbergs Stiftelse, 2015.0153EU, Horisont 2020, 715283
Tillgänglig från: 2022-02-01 Skapad: 2022-02-01 Senast uppdaterad: 2024-01-15Bibliografiskt granskad
Borgström, G., Rohner, C. & Black-Schaffer, D. (2022). Faster Functional Warming with Cache Merging.
Öppna denna publikation i ny flik eller fönster >>Faster Functional Warming with Cache Merging
2022 (Engelska)Rapport (Övrigt vetenskapligt)
Abstract [en]

Smarts-like sampled hardware simulation techniques achieve good accuracy by simulating many small portions of an application in detail. However, while this reduces the detailed simulation time, it results in extensive cache warming times, as each of the many simulation points requires warming the whole memory hierarchy. Adaptive Cache Warming reduces this time by iteratively increasing warming until achieving sufficient accuracy. Unfortunately, each time the warming increases, the previous warming must be redone, nearly doubling the required warming. We address re-warming by developing a technique to merge the cache states from the previous and additional warming iterations.

We address re-warming by developing a technique to merge the cache states from the previous and additional warming iterations. We demonstrate our merging approach on multi-level LRU cache hierarchy and evaluate and address the introduced errors. By removing warming redundancy, we expect an ideal 2× warming speedup when using our Cache Merging solution together with Adaptive Cache Warming. Experiments show that Cache Merging delivers an average speedup of 1.44×, 1.84×, and 1.87× for 128kB, 2MB, and 8MB L2 caches, respectively, with 95-percentile absolute IPC errors of only 0.029, 0.015, and 0.006, respectively. These results demonstrate that Cache Merging yields significantly higher simulation speed with minimal losses.

Förlag
s. 22
Nyckelord
functional warming, cache warming, cache merging
Nationell ämneskategori
Datavetenskap (datalogi)
Forskningsämne
Data- och systemvetenskap
Identifikatorer
urn:nbn:se:uu:diva-484367 (URN)2022-007 (ISRN)
Forskningsfinansiär
Knut och Alice Wallenbergs Stiftelse, 2015.0153EU, Horisont 2020, 715283National Supercomputer Centre (NSC), Sweden, 2021/22-435Swedish National Infrastructure for Computing (SNIC), 2021/23-626Vetenskapsrådet, 2018-05973
Tillgänglig från: 2022-09-10 Skapad: 2022-09-10 Senast uppdaterad: 2022-12-16Bibliografiskt granskad
Hassan, M., Park, C. H. & Black-Schaffer, D. (2021). A Reusable Characterization of the Memory System Behavior of SPEC2017 and SPEC2006. ACM Transactions on Architecture and Code Optimization (TACO), 18(2), Article ID 24.
Öppna denna publikation i ny flik eller fönster >>A Reusable Characterization of the Memory System Behavior of SPEC2017 and SPEC2006
2021 (Engelska)Ingår i: ACM Transactions on Architecture and Code Optimization (TACO), ISSN 1544-3566, E-ISSN 1544-3973, Vol. 18, nr 2, artikel-id 24Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

The SPEC CPU Benchmarks are used extensively for evaluating and comparing improvements to computer systems. This ubiquity makes characterization critical for researchers to understand the bottlenecks the benchmarks do and do not expose and where new designs should and should not be expected to show impact. However, in characterization there is a tradeoff between accuracy and reusability: The more precisely we characterize a benchmark's performance on a given system, the less usable it is across different micro-architectures and varying memory configurations. For SPEC, most existing characterizations include system-specific effects (e.g., via performance counters) and/or only look at aggregate behavior (e.g., averages over the full application execution). While such approaches simplify characterization, they make it difficult to separate the applications' intrinsic behavior from the system-specific effects and/or lose the diverse phase-based behaviors. In this work we focus on characterizing the applications' intrinsic memory behaviour by isolating them from micro-architectural configuration specifics. We do this by providing a simplified generic system model that evaluates the applications' memory behavior across multiple cache sizes, with and without prefetching, and over time. The resulting characterization can be reused across a range of systems to understand application behavior and allow us to see how frequently different behaviors occur. We use this approach to compare the SPEC 2006 and 2017 suites, providing insight into their memory system behaviour beyond previous system-specific and/or aggregate results. We demonstrate the ability to use this characterization in different contexts by showing a portion of the SPEC 2017 benchmark suite that could benefit from giga-scale caches, despite aggregate results indicating otherwise.

Ort, förlag, år, upplaga, sidor
Association for Computing Machinery (ACM)ASSOC COMPUTING MACHINERY, 2021
Nyckelord
Memory systems, cache sensitivity, prefetcher sensitivity, benchmark characterization, workload characterization, memory system characterization
Nationell ämneskategori
Datorsystem Datorteknik
Identifikatorer
urn:nbn:se:uu:diva-442105 (URN)10.1145/3446200 (DOI)000631098200008 ()
Forskningsfinansiär
EU, Europeiska forskningsrådet, 715283Knut och Alice Wallenbergs Stiftelse, 2015.0153
Tillgänglig från: 2021-05-10 Skapad: 2021-05-10 Senast uppdaterad: 2024-04-02Bibliografiskt granskad
Alves, R., Kaxiras, S. & Black-Schaffer, D. (2021). Early Address Prediction: Efficient Pipeline Prefetch and Reuse. ACM Transactions on Architecture and Code Optimization (TACO), 18(3), Article ID 39.
Öppna denna publikation i ny flik eller fönster >>Early Address Prediction: Efficient Pipeline Prefetch and Reuse
2021 (Engelska)Ingår i: ACM Transactions on Architecture and Code Optimization (TACO), ISSN 1544-3566, E-ISSN 1544-3973, Vol. 18, nr 3, artikel-id 39Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

Achieving low load-to-use latency with low energy and storage overheads is critical for performance. Existing techniques either prefetch into the pipeline (via address prediction and validation) or provide data reuse in the pipeline (via register sharing or LO caches). These techniques provide a range of tradeoffs between latency, reuse, and overhead. In this work, we present a pipeline prefetching technique that achieves state-of-the-art performance and data reuse without additional data storage, data movement, or validation overheads by adding address tags to the register file. Our addition of register file tags allows us to forward (reuse) load data from the register file with no additional data movement, keep the data alive in the register file beyond the instruction's lifetime to increase temporal reuse, and coalesce prefetch requests to achieve spatial reuse. Further, we show that we can use the existing memory order violation detection hardware to validate prefetches and data forwards without additional overhead. Our design achieves the performance of existing pipeline prefetching while also forwarding 32% of the loads from the register file (compared to 15% in state-of-the-art register sharing), delivering a 16% reduction in L1 dynamic energy (1.6% total processor energy), with an area overhead of less than 0.5%.

Ort, förlag, år, upplaga, sidor
Association for Computing Machinery (ACM)ASSOC COMPUTING MACHINERY, 2021
Nyckelord
Pipeline prefetching, first level cache, energy efficient computing, address prediction, register sharing
Nationell ämneskategori
Datorsystem
Identifikatorer
urn:nbn:se:uu:diva-452936 (URN)10.1145/3458883 (DOI)000668433900015 ()
Tillgänglig från: 2021-09-13 Skapad: 2021-09-13 Senast uppdaterad: 2024-01-15Bibliografiskt granskad
Hassan, M., Park, C. H. & Black-Schaffer, D. (2020). Architecturally-independent and time-based characterization of SPEC CPU 2017. In: 2020 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS): . Paper presented at 2020 IEEE International Symposium on Performance Analysis of Systems and Software, Boston, August 23-25, 2020 (pp. 107-109).
Öppna denna publikation i ny flik eller fönster >>Architecturally-independent and time-based characterization of SPEC CPU 2017
2020 (Engelska)Ingår i: 2020 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), 2020, s. 107-109Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

Characterizing the memory behaviour of SPEC CPU benchmarks is critical to analyze bottlenecks in the execution. Unfortunately, most prior characterizations are tied to a particular system (e.g., via performance counters, fixed configurations) and missing important time-based behaviour (e.g., average over execution). While performance counters are accurate for that particular system, the results are less accurate for different micro-architectures and configurations. Most importantly, aggregate statistics (e.g., average over full execution) miss important time-based information which reveal transient phases that have significant impact on the execution. This work focuses on micro-architecturally independent, time-based characterization and analysis of the memory system behavior of SPEC CPU 2017. By collecting micro-architecturally independent and time-based information, we provide reusable data for various memory configurations.

Serie
IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)
Nationell ämneskategori
Datorsystem
Identifikatorer
urn:nbn:se:uu:diva-417174 (URN)10.1109/ISPASS48437.2020.00021 (DOI)000637280800011 ()978-1-7281-4798-7 (ISBN)978-1-7281-4799-4 (ISBN)
Konferens
2020 IEEE International Symposium on Performance Analysis of Systems and Software, Boston, August 23-25, 2020
Tillgänglig från: 2020-08-14 Skapad: 2020-08-14 Senast uppdaterad: 2024-04-02Bibliografiskt granskad
Projekt
Proktive hantering av minneshierarkier [2014-05480_VR]; Uppsala universitetApplication-specific Coherence for Concurrent Acceleration of Managed Language [2019-04275_VR]; Uppsala universitet
Organisationer
Identifikatorer
ORCID-id: ORCID iD iconorcid.org/0000-0001-5375-4058

Sök vidare i DiVA

Visa alla publikationer