Logo: to the web site of Uppsala University

uu.sePublications from Uppsala University
Change search
Link to record
Permanent link

Direct link
Black-Schaffer, David, ProfessorORCID iD iconorcid.org/0000-0001-5375-4058
Publications (10 of 72) Show all publications
Norlinder, J., Osterlund, E., Black-Schaffer, D. & Wrigstad, T. (2024). Mark-Scavenge: Waiting for Trash to Take Itself Out. Proceedings of the ACM on Programming Languages, 8(OOPSLA2), Article ID 351.
Open this publication in new window or tab >>Mark-Scavenge: Waiting for Trash to Take Itself Out
2024 (English)In: Proceedings of the ACM on Programming Languages, E-ISSN 2475-1421, Vol. 8, no OOPSLA2, article id 351Article in journal (Refereed) Published
Abstract [en]

Moving garbage collectors (GCs) typically free memory by evacuating live objects in order to reclaim contiguousmemory regions. Evacuation is typically done either during tracing (scavenging), or after tracing whenidentification of live objects is complete (mark-evacuate). Scavenging typically requires more memory (memoryfor all objects to be moved), but performs less work in sparse memory areas (single pass). This makes itattractive for collecting young objects. Mark-evacuate typically requires less memory and performs lesswork in memory areas with dense object clusters, by focusing relocation around sparse regions, making itattractive for collecting old objects. Mark-evacuate also completes identification of live objects faster, makingit attractive for concurrent GCs that can reclaim memory immediately after identification of live objectsfinishes (as opposed to when evacuation finishes), at the expense of more work compared to scavenging, foryoung objects.We propose an alternative approach for concurrent GCs to combine the benefits of scavenging with thebenefits of mark-evacuate, for young objects. The approach is based on the observation that by the timeyoung objects are relocated by a concurrent GC, they are likely to already be unreachable. By performingrelocation lazily, most of the relocations in the defragmentation phase of mark-evacuate can typically beeliminated. Similar to scavenging, objects are relocated during tracing with the proposed approach. However,instead of relocating all objects that are live in the current GC cycle, it lazily relocates profitable sparse objectclusters that survived from the previous GC cycle. This turns the memory headroom that concurrent GCstypically "waste" in order to safely avoid running out of memory before GC finishes, into an asset used toeliminate much of the relocation work, which constitutes a significant portion of the GC work.We call this technique mark-scavenge and implement it on-top of ZGC in OpenJDK in a collector we callMS-ZGC. We perform a performance evaluation that compares MS-ZGC against ZGC. The most striking resultis (up to) 91% reduction in relocation of dead objects (depending on machine-dependent factors).

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2024
Keywords
concurrent, garbage collection, mark-evacuate, scavenging
National Category
Computer Sciences
Identifiers
urn:nbn:se:uu:diva-546755 (URN)10.1145/3689791 (DOI)001360845100009 ()
Funder
Swedish Research Council, 2020-05346Swedish Foundation for Strategic Research, SM19-0059Swedish Research Council, 2022-06725Swedish Foundation for Strategic ResearchSwedish Research Council
Available from: 2025-01-15 Created: 2025-01-15 Last updated: 2025-01-15Bibliographically approved
Norlinder, J., Yang, A. M., Black-Schaffer, D. & Wrigstad, T. (2024). Mutator-Driven Object Placement using Load Barriers. In: Christoph M. Kirsch (Ed.), MPLR 2024: Proceedings of the 21st ACM SIGPLAN International Conference on Managed Programming Languages and Runtimes: . Paper presented at 21st ACM SIGPLAN International Conference on Managed Programming Languages and Runtimes. Association for Computing Machinery (ACM)
Open this publication in new window or tab >>Mutator-Driven Object Placement using Load Barriers
2024 (English)In: MPLR 2024: Proceedings of the 21st ACM SIGPLAN International Conference on Managed Programming Languages and Runtimes / [ed] Christoph M. Kirsch, Association for Computing Machinery (ACM), 2024Conference paper, Published paper (Refereed)
Abstract [en]

Object placement impacts cache utilisation, which is itself critical for performance. Managed languages offer fewer tools than unmanaged languages in the way of controlling object placement due to the abstract view of memory. On the other hand, managed languages often have garbage collectors (GC) that move objects as part of defragmentation. In the context of OpenJDK, Hot-Cold Objects Segregation GC (HCSGC) added locality improvement on-top of ZGC by piggybacking on its loaded value-barrier based design. In addition to the open problem of tuning HCSGC, we identify a contradiction in two of its design goals and propose LR, that addresses both these problems. We implement LR on-top of ZGC and compare it with GCs in OpenJDK and with the best performing HCSGC configuration using DaCapo, JGraphT and SPECjbb2015. While using less resources, LR outperforms HCSGC in 18 configurations, matches performance in 17, and regresses in 3.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2024
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:uu:diva-538495 (URN)dl.acm.org/doi/10.1145/3679007.3685060 (DOI)
Conference
21st ACM SIGPLAN International Conference on Managed Programming Languages and Runtimes
Projects
JVM ReCoDeploying Memory Manage- ment Research in the Mainstream
Funder
Swedish Research Council, 2020-05346Swedish Foundation for Strategic Research, SM19-0059
Available from: 2024-09-16 Created: 2024-09-16 Last updated: 2024-09-16
Norlinder, J., Yang, A. M., Black-Schaffer, D. & Wrigstad, T. (2024). Mutator-Driven Object Placement using Load Barriers. In: Ertl, MA Kirsch, CM (Ed.), PROCEEDINGS OF THE 21ST ACM SIGPLAN INTERNATIONAL CONFERENCE ON MANAGED PROGRAMMING LANGUAGES AND RUNTIMES, MPLR 2024: . Paper presented at 21st ACM SIGPLAN International Conference on Managed Programming Languages and Runtimes (MPLR), SEP 19-19, 2024, Vienna, AUSTRIA (pp. 14-27). Association for Computing Machinery (ACM)
Open this publication in new window or tab >>Mutator-Driven Object Placement using Load Barriers
2024 (English)In: PROCEEDINGS OF THE 21ST ACM SIGPLAN INTERNATIONAL CONFERENCE ON MANAGED PROGRAMMING LANGUAGES AND RUNTIMES, MPLR 2024 / [ed] Ertl, MA Kirsch, CM, Association for Computing Machinery (ACM), 2024, p. 14-27Conference paper, Published paper (Refereed)
Abstract [en]

Object placement impacts cache utilisation, which is itself critical for performance. Managed languages offer fewer tools than unmanaged languages in the way of controlling object placement due to the abstract view of memory. On the other hand, managed languages often have garbage collectors (CC) that move objects as part of defragmentation. In the context of OpenJDK, Hot -Cold Objects Segregation GC (HCSGC) added locality itnprovement on -top of ZCC by piggybacking on its loaded value -harrier based design. In addition to the open problem of tuning HCSGC, we identify a contradiction in two of its design goals and propose LR, that addresses both these problems. We implement LR on-top of ZCC and compare it with GCs in OpenJDK and with the best performing HCSGC configuration using DaCapo, JCraphT and SPECjbb2015. While using less resources, LR outperforms HCSGC in 18 configurations, matches performance in 17, and regresses in 3.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2024
Keywords
garbage collection, locality, cache performance
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:uu:diva-541900 (URN)10.1145/3679007.3685060 (DOI)001321523000003 ()979-8-4007-1118-3 (ISBN)
Conference
21st ACM SIGPLAN International Conference on Managed Programming Languages and Runtimes (MPLR), SEP 19-19, 2024, Vienna, AUSTRIA
Funder
Swedish Research Council, 2020-05346Swedish Foundation for Strategic Research, SM19-0059
Available from: 2024-11-07 Created: 2024-11-07 Last updated: 2024-11-07Bibliographically approved
Nematallah, A., Park, C. H. & Black-Schaffer, D. (2023). Exploring the Latency Sensitivity of Cache Replacement Policies [Letter to the editor]. IEEE Computer Architecture Letters, 22(2), 93-96
Open this publication in new window or tab >>Exploring the Latency Sensitivity of Cache Replacement Policies
2023 (English)In: IEEE Computer Architecture Letters, ISSN 1556-6056, Vol. 22, no 2, p. 93-96Article in journal, Letter (Refereed) Published
Abstract [en]

With DRAM latencies increasing relative to CPU speeds, the performance of caches has become more important. This has led to increasingly sophisticated replacement policies that require complex calculations to update their replacement metadata, which often require multiple cycles. To minimize the negative impact of these metadata updates, architects have focused on policies that incur as little update latency as possible through a combination of reducing the policies’ precision and using parallel hardware. In this work we investigate whether these tradeoffs to reduce cache metadata update latency are needed. Specifically, we look at the performance and energy impact of increasing the latency of cache replacement policy updates. We find that even dramatic increases in replacement policy update latency have very limited effect. This indicates that designers have far more freedom to increase policy complexity and latency than previously assumed.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2023
Keywords
Cache Replacement Policies, Computer Architecture, High-Performance Computing
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:uu:diva-508114 (URN)10.1109/lca.2023.3296251 (DOI)001049956900003 ()
Funder
Knut and Alice Wallenberg Foundation, 2015.0153EU, Horizon 2020, 715283Swedish Research Council, 2019-02429
Available from: 2023-07-20 Created: 2023-07-20 Last updated: 2024-04-05Bibliographically approved
Borgström, G., Rohner, C. & Black-Schaffer, D. (2023). Faster FunctionalWarming with Cache Merging. In: PROCEEDINGS OF SYSTEM ENGINEERING FOR CONSTRAINED EMBEDDED SYSTEMS, DRONESE AND RAPIDO 2023: . Paper presented at Conference on Drone Systems Engineering (DroneSE) / Conference on Rapid Simulation and Performance Evaluation - Methods and Tools (RAPIDO) / Workshop on System Engineering for Constrained Embedded Systems / HiPEAC Conference, JAN 16-18, 2023, Toulouse, FRANCE (pp. 39-47). Association for Computing Machinery (ACM)
Open this publication in new window or tab >>Faster FunctionalWarming with Cache Merging
2023 (English)In: PROCEEDINGS OF SYSTEM ENGINEERING FOR CONSTRAINED EMBEDDED SYSTEMS, DRONESE AND RAPIDO 2023, Association for Computing Machinery (ACM), 2023, p. 39-47Conference paper, Published paper (Refereed)
Abstract [en]

Smarts-like sampled hardware simulation techniques achieve good accuracy by simulating many small portions of an application in detail. However, while this reduces the simulation time, it results in extensive cache warming times, as each of the many simulation points requires warming the whole memory hierarchy. Adaptive Cache Warming reduces this time by iteratively increasing warming to achieve sufficient accuracy. Unfortunately, each increases requires that the previous warming be redone, nearly doubling the total warming. We address re-warming by developing a technique to merge the cache states from the previous and additional warming iterations. We demonstrate our merging approach on multi-level LRU cache hierarchy and evaluate and address the introduced errors. Our experiments show that Cache Merging delivers an average speedup of 1.44x, 1.84x, and 1.87x for 128kB, 2MB, and 8MB L2 caches, respectively, (vs. a 2x theoretical maximum speedup) with 95-percentile absolute IPC errors of only 0.029, 0.015, and 0.006, respectively. These results demonstrate that Cache Merging yields significantly higher simulation speed with minimal losses.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2023
Keywords
functional warming, cache warming, cache merging
National Category
Computer Sciences Computer Engineering
Identifiers
urn:nbn:se:uu:diva-519733 (URN)10.1145/3579170.3579256 (DOI)001106628800005 ()979-8-4007-0045-3 (ISBN)
Conference
Conference on Drone Systems Engineering (DroneSE) / Conference on Rapid Simulation and Performance Evaluation - Methods and Tools (RAPIDO) / Workshop on System Engineering for Constrained Embedded Systems / HiPEAC Conference, JAN 16-18, 2023, Toulouse, FRANCE
Available from: 2024-01-09 Created: 2024-01-09 Last updated: 2024-01-09Bibliographically approved
Haddadi, A., Black-Schaffer, D. & Park, C. H. (2023). Large-scale Graph Processing on Commodity Systems: Understanding and Mitigating the Impact of Swapping. In: The International Symposium on Memory Systems (MEMSYS '23): . Paper presented at The International Symposium on Memory Systems (MEMSYS '23), October 2–5, 2023, Alexandria, VA, USA (pp. 1-11). Association for Computing Machinery (ACM), Article ID 2.
Open this publication in new window or tab >>Large-scale Graph Processing on Commodity Systems: Understanding and Mitigating the Impact of Swapping
2023 (English)In: The International Symposium on Memory Systems (MEMSYS '23), Association for Computing Machinery (ACM), 2023, p. 1-11, article id 2Conference paper, Published paper (Refereed)
Abstract [en]

Graph workloads are critical in many areas. Unfortunately, graph sizes have been increasing faster than DRAM capacity. As a result, large-scale graph processing necessarily falls back to virtual memory paging, resulting in tremendous performance losses.

In this work we investigate how we can get the best possible performance on commodity systems from graphs that cannot fit into DRAM by understanding, and adjusting, how the virtual memory system and the graph characteristics interact. To do so, we first characterize the graph applications, system, and SSD behavior as a function of how much of the graph fits in DRAM. From this analysis we see that for multiple graph types, the system fails to fully utilize the bandwidth of the SSDs due to a lack of parallel page-in requests.

We use this insight to motivate overcommitting CPU threads for graph processing. This allows us to significantly increase the number of parallel page-in requests for several graph types, and recover much of the performance lost to paging. We show that overcommitting threads generally improves performance for various algorithms and graph types. However, we identify one graph that suffers from overcommitting threads, leading to the recommendation that overcommitting threads is generally good for performance, but there may be certain graph inputs that suffer from overcommitting threads.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2023
Keywords
graph processing, virtual memory, swapping, SSD, commodity system, thread overcommitting, characterization, operating system
National Category
Computer Systems
Research subject
Computer Science; Computer Systems Sciences
Identifiers
urn:nbn:se:uu:diva-515500 (URN)10.1145/3631882.3631884 (DOI)001209675300002 ()9798400716447 (ISBN)
Conference
The International Symposium on Memory Systems (MEMSYS '23), October 2–5, 2023, Alexandria, VA, USA
Funder
Knut and Alice Wallenberg Foundation, 2015.0153EU, Horizon 2020, 715283Swedish Research Council, 2019-02429
Note

Funder

Electronics and Telecommunications Research Institute (ETRI)

Grant number: 23ZS1300

Available from: 2023-11-03 Created: 2023-11-03 Last updated: 2024-07-09Bibliographically approved
Hassan, M., Park, C. H. & Black-Schaffer, D. (2023). Protean: Resource-efficient Instruction Prefetching. In: The International Symposium on Memory Systems (MEMSYS '23): . Paper presented at The International Symposium on Memory Systems (MEMSYS '23), October 2–5, 2023, Alexandria, VA, USA (pp. 1-13). Association for Computing Machinery (ACM), Article ID 22.
Open this publication in new window or tab >>Protean: Resource-efficient Instruction Prefetching
2023 (English)In: The International Symposium on Memory Systems (MEMSYS '23), Association for Computing Machinery (ACM), 2023, p. 1-13, article id 22Conference paper, Published paper (Refereed)
Abstract [en]

Increases in code footprint and control flow complexity have made low-latency instruction fetch challenging. Dedicated Instruction Prefetchers (DIPs) can provide performance gains (up to 5%) for a subset of applications that are poorly served by today’s ubiquitous Fetch-Directed Instruction Prefetching (FDIP). However, DIPs incur the significant overhead of in-core metadata storage (for all work- loads) and energy and performance loss from excess prefetches (for many workloads), leading to 11% of workloads actually losing performance. This work addresses how to provide the benefits of a DIP without its costs when the DIP cannot provide a benefit.

Our key insight is that workloads that benefit from DIPs can tolerate increased Branch Target Buffer (BTB) misses. This allows us to dynamically re-purpose the existing BTB storage between the BTB and the DIP. We train a simple performance counter based decision tree to select the optimal configuration at runtime, which allows us to achieve different energy/performance optimization goals. As a result, we pay essentially no area overhead when a DIP is needed, and can use the larger BTB when it is beneficial, or even power it off when not needed.

We look at our impact on two groups of benchmarks: those where the right configuration choice can improve performance or energy and those where the wrong choice could hurt them. For the benchmarks with improvement potential, when optimizing for performance, we are able to obtain 86% of the oracle potential, and when optimizing for energy, 98% of the potential, both while avoid- ing essentially all performance and energy losses on the remaining benchmarks. This demonstrates that our technique is able to dy- namically adapt to different performance/energy goals and obtain essentially all of the potential gains of DIP without the overheads they experience today.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2023
National Category
Computer Systems
Research subject
Computer Science; Computer Systems Sciences
Identifiers
urn:nbn:se:uu:diva-515499 (URN)10.1145/3631882.3631904 (DOI)001209675300022 ()9798400716447 (ISBN)
Conference
The International Symposium on Memory Systems (MEMSYS '23), October 2–5, 2023, Alexandria, VA, USA
Funder
Knut and Alice Wallenberg Foundation, 2015.0153EU, Horizon 2020, 715283Swedish Research Council, 2019-02429
Note

Funder

Electronics and Telecommunications Research Institute (ETRI)

Grant number: 23ZS1300

Available from: 2023-11-03 Created: 2023-11-03 Last updated: 2024-07-09Bibliographically approved
Kumar, R., Alipour, M. & Black-Schaffer, D. (2022). Dependence-aware Slice Execution to Boost MLP in Slice-out-of-order Cores. ACM Transactions on Architecture and Code Optimization (TACO), 19(2), Article ID 25.
Open this publication in new window or tab >>Dependence-aware Slice Execution to Boost MLP in Slice-out-of-order Cores
2022 (English)In: ACM Transactions on Architecture and Code Optimization (TACO), ISSN 1544-3566, E-ISSN 1544-3973, Vol. 19, no 2, article id 25Article in journal (Refereed) Published
Abstract [en]

Exploiting memory-level parallelism (MLP) is crucial to hide long memory and last-level cache access latencies. While out-of-order (OoO) cores, and techniques building on them, are effective at exploiting MLP, they deliver poor energy efficiency due to their complex and energy-hungry hardware. This work revisits slice-out-of-order (sOoO) cores as an energy-efficient alternative for MLP exploitation. sOoO cores achieve energy efficiency by constructing and executing slices of MLP-generating instructions out-of-order only with respect to the rest of instructions; the slices and the remaining instructions, by themselves, execute in-order. However, we observe that existing sOoO cores miss significant MLP opportunities due to their dependence-oblivious in-order slice execution, which causes dependent slices to frequently block MLP generation. To boost MLP generation, we introduce Freeway, a sOoO core based on a new dependence-aware slice execution policy that tracks dependent slices and keeps them from blocking subsequent independent slices and MLP extraction. The proposed core incurs minimal area and power overheads, yet approaches the MLP benefits of fully OoO cores. Our evaluation shows that Freeway delivers 12% better performance than the state-of-the-art sOoO core and is within 7% of the MLP limits of full OoO execution.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM)ASSOC COMPUTING MACHINERY, 2022
Keywords
Microarchitecture, memory level parallelism, instruction scheduling
National Category
Computer Engineering Computer Sciences
Identifiers
urn:nbn:se:uu:diva-473200 (URN)10.1145/3506704 (DOI)000775454600010 ()
Funder
Knut and Alice Wallenberg FoundationEU, Horizon 2020, 715283
Available from: 2022-04-26 Created: 2022-04-26 Last updated: 2024-01-15Bibliographically approved
Park, C. H., Vougioukas, I., Sandberg, A. & Black-Schaffer, D. (2022). Every Walk's a Hit: Making Page Walks Single-Access Cache Hits. In: Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ’22), February 28 – March 4, 2022, Lausanne, Switzerland: . Paper presented at 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ’22), February 28-March 4 2022, Lausanne. Association for Computing Machinery (ACM) Association for Computing Machinery (ACM)
Open this publication in new window or tab >>Every Walk's a Hit: Making Page Walks Single-Access Cache Hits
2022 (English)In: Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ’22), February 28 – March 4, 2022, Lausanne, Switzerland, Association for Computing Machinery (ACM) Association for Computing Machinery (ACM), 2022Conference paper, Published paper (Refereed)
Abstract [en]

As memory capacity has outstripped TLB coverage, large data applications suffer from frequent page table walks. We investigate two complementary techniques for addressing this cost: reducing the number of accesses required and reducing the latency of each access. The first approach is accomplished by opportunistically "flattening" the page table: merging two levels of traditional 4 KB page table nodes into a single 2 MB node, thereby reducing the table's depth and the number of indirections required to traverse it. The second is accomplished by biasing the cache replacement algorithm to keep page table entries during periods of high TLB miss rates, as these periods also see high data miss rates and are therefore more likely to benefit from having the smaller page table in the cache than to suffer from increased data cache misses.

We evaluate these approaches for both native and virtualized systems and across a range of realistic memory fragmentation scenarios, describe the limited changes needed in our kernel implementation and hardware design, identify and address challenges related to self-referencing page tables and kernel memory allocation, and compare results across server and mobile systems using both academic and industrial simulators for robustness.

We find that flattening does reduce the number of accesses required on a page walk (to 1.0), but its performance impact (+2.3%) is small due to Page Walker Caches (already 1.5 accesses). Prioritizing caching has a larger effect (+6.8%), and the combination improves performance by +9.2%. Flattening is more effective on virtualized systems (4.4 to 2.8 accesses, +7.1% performance), due to 2D page walks. By combining the two techniques we demonstrate a state-of-the-art +14.0% performance gain and -8.7% dynamic cache energy and -4.7% dynamic DRAM energy for virtualized execution with very simple hardware and software changes.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM)Association for Computing Machinery (ACM), 2022
Keywords
Flattened page table, page table cache prioritization
National Category
Computer Sciences
Research subject
Computer Systems Sciences
Identifiers
urn:nbn:se:uu:diva-466738 (URN)10.1145/3503222.3507718 (DOI)000810486300010 ()
Conference
27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ’22), February 28-March 4 2022, Lausanne
Funder
Knut and Alice Wallenberg Foundation, 2015.0153EU, Horizon 2020, 715283
Available from: 2022-02-01 Created: 2022-02-01 Last updated: 2024-01-15Bibliographically approved
Borgström, G., Rohner, C. & Black-Schaffer, D. (2022). Faster Functional Warming with Cache Merging.
Open this publication in new window or tab >>Faster Functional Warming with Cache Merging
2022 (English)Report (Other academic)
Abstract [en]

Smarts-like sampled hardware simulation techniques achieve good accuracy by simulating many small portions of an application in detail. However, while this reduces the detailed simulation time, it results in extensive cache warming times, as each of the many simulation points requires warming the whole memory hierarchy. Adaptive Cache Warming reduces this time by iteratively increasing warming until achieving sufficient accuracy. Unfortunately, each time the warming increases, the previous warming must be redone, nearly doubling the required warming. We address re-warming by developing a technique to merge the cache states from the previous and additional warming iterations.

We address re-warming by developing a technique to merge the cache states from the previous and additional warming iterations. We demonstrate our merging approach on multi-level LRU cache hierarchy and evaluate and address the introduced errors. By removing warming redundancy, we expect an ideal 2× warming speedup when using our Cache Merging solution together with Adaptive Cache Warming. Experiments show that Cache Merging delivers an average speedup of 1.44×, 1.84×, and 1.87× for 128kB, 2MB, and 8MB L2 caches, respectively, with 95-percentile absolute IPC errors of only 0.029, 0.015, and 0.006, respectively. These results demonstrate that Cache Merging yields significantly higher simulation speed with minimal losses.

Publisher
p. 22
Keywords
functional warming, cache warming, cache merging
National Category
Computer Sciences
Research subject
Computer Systems Sciences
Identifiers
urn:nbn:se:uu:diva-484367 (URN)2022-007 (ISRN)
Funder
Knut and Alice Wallenberg Foundation, 2015.0153EU, Horizon 2020, 715283National Supercomputer Centre (NSC), Sweden, 2021/22-435Swedish National Infrastructure for Computing (SNIC), 2021/23-626Swedish Research Council, 2018-05973
Available from: 2022-09-10 Created: 2022-09-10 Last updated: 2022-12-16Bibliographically approved
Projects
Proactive Memory Hierarchy Management [2014-05480_VR]; Uppsala UniversityApplication-specific Coherence for Concurrent Acceleration of Managed Language [2019-04275_VR]; Uppsala UniversityRevisiting branch prediction: Can we use broader contextual information to find better patterns? [2024-04443_VR]; Uppsala University
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0001-5375-4058

Search in DiVA

Show all publications