Sustainable computing needs energy-efficient software. This paper explores the potential of leveraging the nature of software written in managed languages: increasing energy efficiency by changing a program’s memory management strategy without altering a single line of code. To this end, we perform comprehensive energy profiling of 35 Java applications across four benchmarks. In many cases, we find that it is possible to save energy by replacing the default G1 collector with another without sacrificing performance. Furthermore, potential energy savings can be even higher if performance regressions are permitted. Inspired by these results, we study what the most energy-efficient GCs are to help developers prune the search space for energy profiling at a low cost. Finally, we show that machine learning can be successfully applied to the problem of finding an energy-efficient GC configuration for an application, reducing the cost even further.
Hardware transactional memory emerged to make parallel programming more accessible. However, the performance pitfall of this technique is squashing speculatively executed instructions and re-executing them in case of aborts, ultimately resorting to serialization in case of repeated conflicts. A significant fraction of aborts occurs due to conflicts (concurrent reads and writes to the same memory location performed by different threads). Our proposal aims to reduce conflict aborts by reducing the window of time during which transactional regions can suffer conflicts. We achieve this by using software prefetching instructions inserted automatically at compile-time. Through these prefetch instructions, we intend to bring the necessary data for each transaction from the main memory to the cache before the transaction itself starts to execute, thus converting the otherwise long latency cache misses into hits during the execution of the transaction. The obtained results show that our approach decreases the number of aborts by 30% on average and improves performance by up to 19% and 10% for two out of the eight evaluated benchmarks. We provide insights into when our technique is beneficial given certain characteristics of the transactional regions, the advantages and disadvantages of our approach, and finally, discuss potential solutions to overcome some of its limitations.
The goal of this work is to explore the opportunistic scheduling of a fully concurrent GC, namely ZGC. The main idea is that scheduling GC threads while the CPU is idle allows for upscaling server workloads at a higher average CPU utilization since latency starts degrading at a higher CPU utilization. We showed that opportunistic scheduling can allow SPECjbb2015 to process up to 15% more requests under 25ms.
The growing concern for energy efficiency in the Information and Communication Technology (ICT) sector has prompted the exploration of resource management techniques. While hardware architectures, such as single-ISA asymmetric multicore processors (AMP), offer potential energy savings, there is still untapped potential for software optimizations. This paper aims to bridge this gap by investigating the scheduling of garbage collection (GC) activities on a heterogeneous architecture with both performance cores (“p-cores”) and energy cores (“e-cores”) to achieve energy savings.
Our study focuses on the concurrent ZGC collector in the context of Java Virtual Machines (JVM), as the energy aspect is not well studied in the context of latency-sensitive Java workloads. By comparing the energy efficiency, performance, latency, and memory utilization of executing GC on p-cores versus e-cores, we present compelling findings.
We demonstrate that scheduling GC work on e-cores overall leads to approximately 3% energy savings without performance and mean latency degradation while requiring no additional effort from developers. Overall energy reduction can increase to 5.3±0.0225% by tuning the number of e-cores (still not changing the program!).
Our findings highlight the practicality and benefits of scheduling GC on e-cores, showcasing the potential for energy savings in heterogeneous architectures running Java workloads while meeting critical latency requirements. Our research contributes to the ongoing efforts toward achieving a more sustainable and efficient ICT sector.
This paper explores automatic heap sizing where developers let the frequency of GC expressed as a target overhead of the application's CPU utilisation, control the size of the heap, as opposed to the other way around. Given enough headroom and spare CPU, a concurrent garbage collector should be able to keep up with the application's allocation rate, and neither the frequency nor duration of GC should impact throughput and latency. Because of the inverse relationship between time spent performing garbage collection and the minimal size of the heap, this enables trading memory for computation and conversely, neutral to an application's performance. We describe our proposal for automatically adjusting the size of a program's heap based on the CPU overhead of GC. We show how our idea can be relatively easily integrated into ZGC, a concurrent collector in OpenJDK, and study the impact of our approach on memory requirements, throughput, latency, and energy.