Efficient inter-core power and thermal balancing for multicore processors
2013 (English)In: Computing, ISSN 0010-485X, E-ISSN 1436-5057, Vol. 95, no 7, 537-566 p.Article in journal (Refereed) Published
Nowadays the market is dominated by processor architectures that employ multiple cores per chip. These architectures have different behavior depending on the applications running on the processor (parallel, multiprogrammed, sequential), but all happen to meet what is called the power and temperature wall. For future technologies (less than 22 nm) and a fixed die size, it is still uncertain the percentage of processor that can be simultaneously powered on. Power saving and power budget mechanisms can be useful to precisely control the amount of power been dissipated by the processor. After an initial analysis we discover that legacy power saving techniques work properly for matching a power budget in thread-independent and multi-programmed workloads, but not in parallel workloads. When running parallel shared-memory applications sacrificing some performance in a single core (thread) in order to be more energy-efficient can unintentionally delay the rest of cores (threads) due to synchronization points (locks/barriers), having a negative impact on global performance. In order to solve this problem we propose power token balancing (PTB) aimed at accurately matching an external power constraint by balancing the power consumed among the different cores. Experimental results show that PTB matches more accurately a predefined power budget (50 % of the original peak power) than other mechanisms like DVFS. The total energy consumed over the budget is reduced to only 8 % for a 16-core CMP with only a 3 % energy increase (overhead). We also introduce a novel mechanism named "Nitro". Nitro will overclock the core that enters a critical section (delimited by locks) in order to free the lock as soon as possible. Experimental results have shown that Nitro is able to reduce the execution time of lock-intensive applications in more than 4 % by overclocking the frequency by 15 % in selected program phases over a period of time that represents a 22 % of the total execution time. We conclude the work with an analysis of the thermal effects of PTB in different CMP configurations using realistic power numbers and heatsink/fan configurations. Results show how PTB not only balances temperature between the different cores, reducing temperature gradient and increasing signal reliability, but also allows a reduction of 28-30 % of both average and peak temperatures for the studied benchmarks when a peak power budget of 50 % is exceeded.
Place, publisher, year, edition, pages
2013. Vol. 95, no 7, 537-566 p.
Power consumption, Power budget, Power tokens, Chip multiprocessor
IdentifiersURN: urn:nbn:se:uu:diva-204781DOI: 10.1007/s00607-012-0236-6ISI: 000321072200001OAI: oai:DiVA.org:uu-204781DiVA: diva2:641123