uu.seUppsala universitets publikasjoner
Endre søk
Link to record
Permanent link

Direct link
BETA
Carlson, Trevor E.
Publikasjoner (10 av 16) Visa alla publikasjoner
Alipour, M., Carlson, T. E., Black-Schaffer, D. & Kaxiras, S. (2019). Maximizing limited resources: A limit-based study and taxonomy of out-of-order commit. Journal of Signal Processing Systems, 91(3-4), 379-397
Åpne denne publikasjonen i ny fane eller vindu >>Maximizing limited resources: A limit-based study and taxonomy of out-of-order commit
2019 (engelsk)Inngår i: Journal of Signal Processing Systems, ISSN 1939-8018, E-ISSN 1939-8115, Vol. 91, nr 3-4, s. 379-397Artikkel i tidsskrift (Fagfellevurdert) Published
Abstract [en]

Out-of-order execution is essential for high performance, general-purpose computation, as it can find and execute useful work instead of stalling. However, it is typically limited by the requirement of visibly sequential, atomic instruction executionin other words, in-order instruction commit. While in-order commit has a number of advantages, such as providing precise interrupts and avoiding complications with the memory consistency model, it requires the core to hold on to resources (reorder buffer entries, load/store queue entries, physical registers) until they are released in program order. In contrast, out-of-order commit can release some resources much earlier, yielding improved performance and/or lower resource requirements. Non-speculative out-of-order commit is limited in terms of correctness by the conditions described in the work of Bell and Lipasti (2004). In this paper we revisit out-of-order commit by examining the potential performance benefits of lifting these conditions one by one and in combination, for both non-speculative and speculative out-of-order commit. While correctly handling recovery for all out-of-order commit conditions currently requires complex tracking and expensive checkpointing, this work aims to demonstrate the potential for selective, speculative out-of-order commit using an oracle implementation without speculative rollback costs. Through this analysis of the potential of out-of-order commit, we learn that: a) there is significant untapped potential for aggressive variants of out-of-order commit; b) it is important to optimize the out-of-order commit depth for a balanced design, as smaller cores benefit from reduced depth while larger cores continue to benefit from deeper designs; c) the focus on implementing only a subset of the out-of-order commit conditions could lead to efficient implementations; d) the benefits of out-of-order commit increases with higher memory latency and in conjunction with prefetching; e) out-of-order commit exposes additional parallelism in the memory hierarchy.

HSV kategori
Identifikatorer
urn:nbn:se:uu:diva-365899 (URN)10.1007/s11265-018-1369-4 (DOI)000459428200012 ()
Tilgjengelig fra: 2018-04-26 Laget: 2018-11-14 Sist oppdatert: 2019-03-21bibliografisk kontrollert
Ceballos, G., Sembrant, A., Carlson, T. E. & Black-Schaffer, D. (2018). Behind the Scenes: Memory Analysis of Graphical Workloads on Tile-based GPUs. In: Proc. International Symposium on Performance Analysis of Systems and Software: ISPASS 2018. Paper presented at ISPASS 2018, April 2–4, Belfast, UK (pp. 1-11). IEEE Computer Society
Åpne denne publikasjonen i ny fane eller vindu >>Behind the Scenes: Memory Analysis of Graphical Workloads on Tile-based GPUs
2018 (engelsk)Inngår i: Proc. International Symposium on Performance Analysis of Systems and Software: ISPASS 2018, IEEE Computer Society, 2018, s. 1-11Konferansepaper, Publicerat paper (Fagfellevurdert)
sted, utgiver, år, opplag, sider
IEEE Computer Society, 2018
HSV kategori
Identifikatorer
urn:nbn:se:uu:diva-361214 (URN)10.1109/ISPASS.2018.00009 (DOI)978-1-5386-5010-3 (ISBN)
Konferanse
ISPASS 2018, April 2–4, Belfast, UK
Prosjekter
UPMARC
Tilgjengelig fra: 2018-09-21 Laget: 2018-09-21 Sist oppdatert: 2018-11-16bibliografisk kontrollert
Nikoleris, N., Hagersten, E. & Carlson, T. E. (2018). Delorean: Virtualized Directed Profiling for Cache Modeling in Sampled Simulation.
Åpne denne publikasjonen i ny fane eller vindu >>Delorean: Virtualized Directed Profiling for Cache Modeling in Sampled Simulation
2018 (engelsk)Rapport (Annet vitenskapelig)
Abstract [en]

Current practice for accurate and efficient simulation (e.g., SMARTS and Simpoint) makes use of sampling to significantly reduce the time needed to evaluate new research ideas. By evaluating a small but representative portion of the original application, sampling can allow for both fast and accurate performance analysis. However, as cache sizes of modern architectures grow, simulation time is dominated by warming microarchitectural state and not by detailed simulation, reducing overall simulation efficiency. While checkpoints can significantly reduce cache warming, improving efficiency, they limit the flexibility of the system under evaluation, requiring new checkpoints for software updates (such as changes to the compiler and compiler flags) and many types of hardware modifications. An ideal solution would allow for accurate cache modeling for each simulation run without the need to generate rigid checkpointing data a priori.

Enabling this new direction for fast and flexible simulation requires a combination of (1) a methodology that allows for hardware and software flexibility and (2) the ability to quickly and accurately model arbitrarily-sized caches. Current approaches that rely on checkpointing or statistical cache modeling require rigid, up-front state to be collected which needs to be amortized over a large number of simulation runs. These earlier methodologies are insufficient for our goals for improved flexibility. In contrast, our proposed methodology, Delorean, outlines a unique solution to this problem. The Delorean simulation methodology enables both flexibility and accuracy by quickly generating a targeted cache model for the next detailed region on the fly without the need for up-front simulation or modeling. More specifically, we propose a new, more accurate statistical cache modeling method that takes advantage of hardware virtualization to precisely determine the memory regions accessed and to minimize the time needed for data collection while maintaining accuracy.

Delorean uses a multi-pass approach to understand the memory regions accessed by the next, upcoming detailed region. Our methodology collects the entire set of key memory accesses and, through fast virtualization techniques, progressively scans larger, earlier regions to learn more about these key accesses in an efficient way. Using these techniques, we demonstrate that Delorean allows for the fast evaluation of systems and their software though the generation of accurate cache models on the fly. Delorean outperforms previous proposals by an order of magnitude, with a simulation speed of 150 MIPS and a similar average CPI error (below 4%).

Publisher
s. 12
Serie
Technical report / Department of Information Technology, Uppsala University, ISSN 1404-3203
HSV kategori
Forskningsprogram
Datavetenskap
Identifikatorer
urn:nbn:se:uu:diva-369320 (URN)
Tilgjengelig fra: 2018-12-12 Laget: 2018-12-12 Sist oppdatert: 2019-01-08bibliografisk kontrollert
Krzywda, J., Ali-Eldin, A., Carlson, T. E., Östberg, P.-O. & Elmroth, E. (2018). Power-performance tradeoffs in data center servers: DVFS, CPU pinning, horizontal, and vertical scaling. Future generations computer systems, 81, 114-128
Åpne denne publikasjonen i ny fane eller vindu >>Power-performance tradeoffs in data center servers: DVFS, CPU pinning, horizontal, and vertical scaling
Vise andre…
2018 (engelsk)Inngår i: Future generations computer systems, ISSN 0167-739X, E-ISSN 1872-7115, Vol. 81, s. 114-128Artikkel i tidsskrift (Fagfellevurdert) Published
Abstract [en]

Dynamic Voltage and Frequency Scaling (DVFS), CPU pinning, horizontal, and vertical scaling, are four techniques that have been proposed as actuators to control the performance and energy consumption on data center servers. This work investigates the utility of these four actuators, and quantifies the power-performance tradeoffs associated with them. Using replicas of the German Wikipedia running on our local testbed, we perform a set of experiments to quantify the influence of DVFS, vertical and horizontal scaling, and CPU pinning on end-to-end response time (average and tail), throughput, and power consumption with different workloads. Results of the experiments show that DVFS rarely reduces the power consumption of underloaded servers by more than 5%, but it can be used to limit the maximal power consumption of a saturated server by up to 20% (at a cost of performance degradation). CPU pinning reduces the power consumption of underloaded server (by up to 7%) at the cost of performance degradation, which can be limited by choosing an appropriate CPU pinning scheme. Horizontal and vertical scaling improves both the average and tail response time, but the improvement is not proportional to the amount of resources added. The load balancing strategy has a big impact on the tail response time of horizontally scaled applications.

sted, utgiver, år, opplag, sider
ELSEVIER SCIENCE BV, 2018
Emneord
Power-performance tradeoffs, Dynamic Voltage and Frequency Scaling (DVFS), CPU pinning, Horizontal scaling, Vertical scaling
HSV kategori
Identifikatorer
urn:nbn:se:uu:diva-345707 (URN)10.1016/j.future.2017.10.044 (DOI)000423652200010 ()
Forskningsfinansiär
Swedish Research Council, C0590801eSSENCE - An eScience CollaborationEU, FP7, Seventh Framework Programme, 610711, 610490EU, Horizon 2020, 732667
Tilgjengelig fra: 2018-03-14 Laget: 2018-03-14 Sist oppdatert: 2018-03-14bibliografisk kontrollert
Sembrant, A., Carlson, T. E., Hagersten, E. & Black-Schaffer, D. (2017). A graphics tracing framework for exploring CPU+GPU memory systems. In: Proc. 20th International Symposium on Workload Characterization: . Paper presented at IISWC 2017, October 1–3, Seattle, WA (pp. 54-65). IEEE
Åpne denne publikasjonen i ny fane eller vindu >>A graphics tracing framework for exploring CPU+GPU memory systems
2017 (engelsk)Inngår i: Proc. 20th International Symposium on Workload Characterization, IEEE, 2017, s. 54-65Konferansepaper, Publicerat paper (Fagfellevurdert)
sted, utgiver, år, opplag, sider
IEEE, 2017
HSV kategori
Identifikatorer
urn:nbn:se:uu:diva-357055 (URN)10.1109/IISWC.2017.8167756 (DOI)000428206700006 ()978-1-5386-1233-0 (ISBN)
Konferanse
IISWC 2017, October 1–3, Seattle, WA
Tilgjengelig fra: 2017-12-07 Laget: 2018-08-17 Sist oppdatert: 2018-09-24bibliografisk kontrollert
Alipour, M., Carlson, T. E. & Kaxiras, S. (2017). A Taxonomy of Out-of-Order Instruction Commit. In: 2017 Ieee International Symposium On Performance Analysis Of Systems And Software (Ispass): . Paper presented at 2017 Ieee International Symposium On Performance Analysis Of Systems And Software (Ispass), Santa Rosa, CA, USA. (pp. 135-136). Los Alamitos: IEEE Computer Society
Åpne denne publikasjonen i ny fane eller vindu >>A Taxonomy of Out-of-Order Instruction Commit
2017 (engelsk)Inngår i: 2017 Ieee International Symposium On Performance Analysis Of Systems And Software (Ispass), Los Alamitos: IEEE Computer Society, 2017, s. 135-136Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

While in-order instruction commit has its advantages, such as providing precise interrupts and avoiding complications with the memory consistency model, it requires the core to hold on to resources (reorder buffer entries, load/store queue entries, registers) until they are released in program order. In contrast, out-of-order commit releases resources much earlier, yielding improved performance without the need for additional hardware resources. In this paper, we revisit out-of-order commit from a different perspective, not by proposing another hardware technique, but by introducing a taxonomy and evaluating three different micro-architectures that have this technique enabled. We show how smaller processors can benefit from simple out-oforder commit strategies, but that larger, aggressive cores require more aggressive strategies to improve performance.

sted, utgiver, år, opplag, sider
Los Alamitos: IEEE Computer Society, 2017
HSV kategori
Identifikatorer
urn:nbn:se:uu:diva-352938 (URN)10.1109/ISPASS.2017.7975283 (DOI)000426905600020 ()978-1-5386-3890-3 (ISBN)978-1-5386-3891-0 (ISBN)978-1-5386-3889-7 (ISBN)
Konferanse
2017 Ieee International Symposium On Performance Analysis Of Systems And Software (Ispass), Santa Rosa, CA, USA.
Tilgjengelig fra: 2018-06-12 Laget: 2018-06-12 Sist oppdatert: 2018-06-12bibliografisk kontrollert
Ceballos, G., Sembrant, A., Carlson, T. E. & Black-Schaffer, D. (2017). Analyzing Graphics Workloads on Tile-based GPUs. In: Proc. 20th International Symposium on Workload Characterization: . Paper presented at IISWC 2017, October 1–3, Seattle, WA (pp. 108-109). IEEE
Åpne denne publikasjonen i ny fane eller vindu >>Analyzing Graphics Workloads on Tile-based GPUs
2017 (engelsk)Inngår i: Proc. 20th International Symposium on Workload Characterization, IEEE, 2017, s. 108-109Konferansepaper, Publicerat paper (Fagfellevurdert)
sted, utgiver, år, opplag, sider
IEEE, 2017
HSV kategori
Identifikatorer
urn:nbn:se:uu:diva-335559 (URN)10.1109/IISWC.2017.8167761 (DOI)000428206700011 ()978-1-5386-1233-0 (ISBN)
Konferanse
IISWC 2017, October 1–3, Seattle, WA
Prosjekter
UPMARC
Forskningsfinansiär
Swedish Foundation for Strategic Research , FFL12-0051
Tilgjengelig fra: 2017-12-06 Laget: 2017-12-06 Sist oppdatert: 2018-11-15bibliografisk kontrollert
Tran, K.-A., Carlson, T. E., Koukos, K., Själander, M., Spiliopoulos, V., Kaxiras, S. & Jimborean, A. (2017). Clairvoyance: Look-ahead compile-time scheduling. In: Proc. 15th International Symposium on Code Generation and Optimization: . Paper presented at CGO 2017, February 4–8, Austin, TX (pp. 171-184). Piscataway, NJ: IEEE Press
Åpne denne publikasjonen i ny fane eller vindu >>Clairvoyance: Look-ahead compile-time scheduling
Vise andre…
2017 (engelsk)Inngår i: Proc. 15th International Symposium on Code Generation and Optimization, Piscataway, NJ: IEEE Press, 2017, s. 171-184Konferansepaper, Publicerat paper (Fagfellevurdert)
sted, utgiver, år, opplag, sider
Piscataway, NJ: IEEE Press, 2017
HSV kategori
Identifikatorer
urn:nbn:se:uu:diva-316480 (URN)000402548700015 ()978-1-5090-4931-8 (ISBN)
Konferanse
CGO 2017, February 4–8, Austin, TX
Prosjekter
UPMARC
Forskningsfinansiär
Swedish Research Council, 2010-4741
Tilgjengelig fra: 2017-02-04 Laget: 2017-03-01 Sist oppdatert: 2018-04-26bibliografisk kontrollert
Alipour, M., Carlson, T. E. & Kaxiras, S. (2017). Exploring the performance limits of out-of-order commit. In: Proc. 14th Computing Frontiers Conference: . Paper presented at CF 2017, May 15–17, Siena, Italy (pp. 211-220). New York: ACM Press
Åpne denne publikasjonen i ny fane eller vindu >>Exploring the performance limits of out-of-order commit
2017 (engelsk)Inngår i: Proc. 14th Computing Frontiers Conference, New York: ACM Press, 2017, s. 211-220Konferansepaper, Publicerat paper (Fagfellevurdert)
sted, utgiver, år, opplag, sider
New York: ACM Press, 2017
HSV kategori
Identifikatorer
urn:nbn:se:uu:diva-334601 (URN)10.1145/3075564.3075581 (DOI)978-1-4503-4487-6 (ISBN)
Konferanse
CF 2017, May 15–17, Siena, Italy
Prosjekter
UPMARC
Tilgjengelig fra: 2017-05-15 Laget: 2017-11-24 Sist oppdatert: 2018-01-13bibliografisk kontrollert
Ros, A., Carlson, T. E., Alipour, M. & Kaxiras, S. (2017). Non-speculative load-load reordering in TSO. In: Proc. 44th International Symposium on Computer Architecture: . Paper presented at ISCA 2017, June 24–28, Toronto, Canada (pp. 187-200). New York: ACM Press
Åpne denne publikasjonen i ny fane eller vindu >>Non-speculative load-load reordering in TSO
2017 (engelsk)Inngår i: Proc. 44th International Symposium on Computer Architecture, New York: ACM Press, 2017, s. 187-200Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

In Total Store Order memory consistency (TSO), loads can be speculatively reordered to improve performance. If a load-load reordering is seen by other cores, speculative loads must be squashed and re-executed. In architectures with an unordered interconnection network and directory coherence, this has been the established view for decades. We show, for the first time, that it is not necessary to squash and re-execute speculatively reordered loads in TSO when their reordering is seen. Instead, the reordering can be hidden form other cores by the coherence protocol. The implication is that we can irrevocably bind speculative loads. This allows us to commit reordered loads out-of-order without having to wait (for the loads to become non-speculative) or without having to checkpoint committed state (and rollback if needed), just to ensure correctness in the rare case of some core seeing the reordering. We show that by exposing a reordering to the coherence layer and by appropriately modifying a typical directory protocol we can successfully hide load-load reordering without perceptible performance cost and without deadlock. Our solution is cost-effective and increases the performance of out-of-order commit by a sizable margin, compared to the base case where memory operations are not allowed to commit if the consistency model could be violated.

sted, utgiver, år, opplag, sider
New York: ACM Press, 2017
Emneord
Cache coherence, memory consistency, TSO, load reordering, out-of-order commit
HSV kategori
Identifikatorer
urn:nbn:se:uu:diva-323468 (URN)10.1145/3079856.3080220 (DOI)000426483300015 ()978-1-4503-4892-8 (ISBN)
Konferanse
ISCA 2017, June 24–28, Toronto, Canada
Prosjekter
UPMARC
Forskningsfinansiär
Swedish Research Council, 621-2012-5332
Tilgjengelig fra: 2017-06-24 Laget: 2017-06-07 Sist oppdatert: 2018-06-08bibliografisk kontrollert
Organisasjoner