skip to main content
10.1145/3581784.3607091acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article
Open access

DPS: Adaptive Power Management for Overprovisioned Systems

Published: 11 November 2023 Publication History

Abstract

Maximizing performance under a power budget is essential for HPC systems and has inspired the development of many power management frameworks. These can be broadly characterized into two groups: model-based and stateless. Model-based frameworks use machine learning to achieve good performance under a power budget but are highly dependent on the quality of the learned model and the data used to train it. Stateless frameworks are more robust and require no training, but are generally lower performance. In this paper, we propose a new framework that does not require a model, but does track a small amount of state in the form of recent power dynamics. We implement this idea and test it on a public cloud running both Spark and HPC jobs. We find when total power demand is low, our framework achieves equivalent performance to prior work, but when power demand is high it achieves mean 8% performance improvement (with no reliance on a learned model).

References

[1]
2022. Top 500 Supercomputing Site. https://top500.org/
[2]
David Bailey, Tim Harris, William Saphir, Rob Van Der Wijngaart, Alex Woo, and Maurice Yarrow. 1995. The NAS parallel benchmarks 2.0. Technical Report. Technical Report NAS-95-020, NASA Ames Research Center.
[3]
Luiz André Barroso and Urs Hölzle. 2007. The Case for Energy-Proportional Computing. Computer 40, 12 (2007), 33--37.
[4]
R. Bianchini and R. Rajamony. 2004. Power and energy management for server systems. Computer 37, 11 (2004), 68--76.
[5]
Stephanie Brink, Matthew Larsen, Hank Childs, and Barry Rountree. 2021. Evaluating adaptive and predictive power management strategies for optimizing visualization performance on supercomputers. Parallel Comput. 104--105 (2021), 102782.
[6]
Howard David, Eugene Gorbatov, Ulf R. Hanebutte, Rahul Khanna, and Christian Le. 2010. RAPL: Memory power estimation and capping. In 2010 ACM/IEEE International Symposium on Low-Power Electronics and Design (ISLPED). 189--194.
[7]
Daniel Ellsworth, Tapasya Patki, Swann Perarnau, Sangmin Seo, Abdelhalim Amer, Judicael Zounmevo, Rinku Gupta, Kazutomo Yoshii, Henry Hoffman, Allen Malony, Martin Schulz, and Pete Beckman. 2016. Systemwide Power Management with Argo. In 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). 1118--1121.
[8]
Daniel A. Ellsworth, Allen D. Malony, Barry Rountree, and Martin Schulz. 2015. Dynamic power sharing for higher job throughput. In SC '15: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 1--11.
[9]
Daniel A. Ellsworth, Allen D. Malony, Barry Rountree, and Martin Schulz. 2015. POW: System-Wide Dynamic Reallocation of Limited Power in HPC. In Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing (Portland, Oregon, USA) (HPDC '15). Association for Computing Machinery, New York, NY, USA, 145--148.
[10]
Xixhou Feng, Rong Ge, and K.W. Cameron. 2005. Power and energy profiling of scientific applications on distributed systems. In 19th IEEE International Parallel and Distributed Processing Symposium. 10 pp.-.
[11]
Rodrigo Fonseca, Prabal Dutta, Philip Levis, and Ion Stoica. 2008. Quanto: Tracking Energy in Networked Embedded Systems. In Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation (San Diego, California) (OSDI'08). USENIX Association, USA, 323--338.
[12]
R. Ge, Xizhou Feng, and K.W. Cameron. 2005. Performance-constrained Distributed DVS Scheduling for Scientific Applications on Power-aware Clusters. In SC '05: Proceedings of the 2005 ACM/IEEE Conference on Supercomputing. 34--34.
[13]
Yiannis Georgiou, Thomas Cadeau, David Glesser, Danny Auble, Morris Jette, and Matthieu Hautreux. 2014. Energy Accounting and Control with SLURM Resource and Job Management System. In Distributed Computing and Networking, Mainak Chatterjee, Jian-nong Cao, Kishore Kothapalli, and Sergio Rajsbaum (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 96--118.
[14]
Neha Gholkar, Frank Mueller, and Barry Rountree. 2016. Power tuning HPC jobs on power-constrained systems. In 2016 International Conference on Parallel Architecture and Compilation Techniques (PACT). 179--190.
[15]
Neha Gholkar, Frank Mueller, Barry Rountree, and Aniruddha Marathe. 2018. PShifter: Feedback-Based Dynamic Power Shifting within HPC Jobs for Performance. In Proceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing (Tempe, Arizona) (HPDC '18). Association for Computing Machinery, New York, NY, USA, 106--117.
[16]
Henry Hoffmann. 2015. JouleGuard: Energy Guarantees for Approximate Applications. In Proceedings of the 25th Symposium on Operating Systems Principles (Monterey, California) (SOSP '15). Association for Computing Machinery, New York, NY, USA, 198--214.
[17]
Henry Hoffmann and Martina Maggio. 2014. PCP: A Generalized Approach to Optimizing Performance Under Power Constraints through Resource Management. In 11th International Conference on Autonomic Computing, ICAC '14, Philadelphia, PA, USA, June 18--20, 2014, Xiaoyun Zhu, Giuliano Casale, and Xiaohui Gu (Eds.). USENIX Association, 241--247. https://www.usenix.org/conference/icac14/technical-sessions/presentation/hoffman
[18]
Xiaofeng Hou, Chao Li, Jiacheng Liu, Lu Zhang, Yang Hu, and Minyi Guo. 2020. ANT-Man: Towards Agile Power Management in the Microservice Era. In SC20: International Conference for High Performance Computing, Networking, Storage and Analysis. 1--14.
[19]
S. Huang and W. Feng. 2009. Energy-Efficient Cluster Computing via Accurate Workload Characterization. In 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid. 68--75.
[20]
Shengsheng Huang, Jie Huang, Yan Liu, and Jinquan Dai. 2012. HiBench : A Representative and Comprehensive Hadoop Benchmark Suite.
[21]
Connor Imes, Huazhe Zhang, Kevin Zhao, and Henry Hoffmann. 2019. CoPPer: Soft Real-Time Application Performance Using Hardware Power Capping. In 2019 IEEE International Conference on Autonomic Computing, ICAC 2019, Umeå, Sweden, June 16--20, 2019. IEEE, 31--41.
[22]
Kate Keahey, Jason Anderson, Zhuo Zhen, Pierre Riteau, Paul Ruth, Dan Stanzione, Mert Cevik, Jacob Colleran, Haryadi S. Gunawi, Cody Hammock, Joe Mambretti, Alexander Barnes, François Halbah, Alex Rocha, and Joe Stubbs. 2020. Lessons Learned from the Chameleon Testbed. In 2020 USENIX Annual Technical Conference (USENIX ATC 20). USENIX Association, 219--233. https://www.usenix.org/conference/atc20/presentation/keahey
[23]
Kashif Nizam Khan, Mikael Hirki, Tapio Niemi, Jukka K. Nurminen, and Zhonghong Ou. 2018. RAPL in Action: Experiences in Using RAPL for Power Measurements. ACM Trans. Model. Perform. Eval. Comput. Syst. 3, 2, Article 9 (mar 2018), 26 pages.
[24]
Yuetsu Kodama, Tetsuya Odajima, Eishi Arima, and Mitsuhisa Sato. 2020. Evaluation of Power Management Control on the Supercomputer Fugaku. In 2020 IEEE International Conference on Cluster Computing (CLUSTER). 484--493.
[25]
S Labasan, M Larsen, B Rountree, and H Childs. 2017. PaViz: A Power-Adaptive Framework for Optimal Power and Performance of Scientific Visualization Algorithms. (3 2017). https://www.osti.gov/biblio/1366964
[26]
Savoie Lee, David K. Lowenthal, Bronis R. De Supinski, Tanzima Islam, Kathryn Mohror, Barry Rountree, and Martin Schulz. 2016. I/O Aware Power Shifting. In 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS). 740--749.
[27]
Aniruddha Marathe, Peter E. Bailey, David K. Lowenthal, Barry Rountree, Martin Schulz, and Bronis R. de Supinski. 2015. A Run-Time System for Power-Constrained HPC Applications. In High Performance Computing, Julian M. Kunkel and Thomas Ludwig (Eds.). Springer International Publishing, Cham, 394--408.
[28]
Ivana Marincic, Venkatram Vishwanath, and Henry Hoffmann. 2020. SeeSAw: Optimizing Performance of In-Situ Analytics Applications under Power Constraints. In 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), New Orleans, LA, USA, May 18--22, 2020. IEEE, 789--798.
[29]
D. Meisner, C.M. Sadler, Luiz Barroso, W. Weber, and Thomas Wenisch. 2011. Power management of Online Data-Intensive services. Proceedings - International Symposium on Computer Architecture, 319--330.
[30]
Paul Messina. 2017. The USDOE Exascale Computing Project-Goals and Challenges.
[31]
Nikita Mishra, Huazhe Zhang, John D. Lafferty, and Henry Hoffmann. 2015. A Probabilistic Graphical Model-Based Approach for Minimizing Energy Under Performance Constraints. SIGPLAN Not. 50, 4 (mar 2015), 267--281.
[32]
Girish Palshikar et al. 2009. Simple algorithms for peak detection in time-series. In Proc. 1st Int. Conf. Advanced Data Analysis, Business Analytics and Intelligence, Vol. 122.
[33]
Tapasya Patki, David K. Lowenthal, Barry Rountree, Martin Schulz, and Bronis R. de Supinski. 2013. Exploring Hardware Overprovisioning in Power-Constrained, High Performance Computing. In Proceedings of the 27th International ACM Conference on International Conference on Supercomputing (Eugene, Oregon, USA) (ICS '13). Association for Computing Machinery, New York, NY, USA, 173--182.
[34]
Swann Perarnau, Rajeev Thakur, Kamil Iskra, Ken Raffenetti, Franck Cappello, Rinku Gupta, Pete Beckman, Marc Snir, Henry Hoffmann, Martin Schulz, and Barry Rountree. 2015. Distributed Monitoring and Management of Exascale Systems in the Argo Project. In Distributed Applications and Interoperable Systems, Alysson Bessani and Sara Bouchenak (Eds.). Springer International Publishing, Cham, 173--178.
[35]
Barry Rountree, David K. Lowenthal, Bronis R. de Supinski, Martin Schulz, Vincent W. Freeh, and Tyler Bletsch. 2009. Adagio: Making DVS Practical for Complex HPC Applications. In Proceedings of the 23rd International Conference on Supercomputing (Yorktown Heights, NY, USA) (ICS '09). Association for Computing Machinery, New York, NY, USA, 460--469.
[36]
Arjun Roy, Stephen M. Rumble, Ryan Stutsman, Philip Levis, David Mazières, and Nickolai Zeldovich. 2011. Energy Management in Mobile Devices with the Cinder Operating System. In Proceedings of the Sixth Conference on Computer Systems (Salzburg, Austria) (EuroSys '11). Association for Computing Machinery, New York, NY, USA, 139--152.
[37]
Varun Sakalkar, Vasileios Kontorinis, David Landhuis, Shaohong Li, Darren De Ronde, Thomas Blooming, Anand Ramesh, James Kennedy, Christopher Malone, Jimmy Clidaras, and Parthasarathy Ranganathan. 2020. Data Center Power Oversubscription with a Medium Voltage Power Plane and Priority-Aware Capping. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (Lausanne, Switzerland) (ASPLOS '20). Association for Computing Machinery, New York, NY, USA, 497--511.
[38]
Ryuichi Sakamoto, Thang Cao, Masaaki Kondo, Koji Inoue, Masatsugu Ueda, Tapasya Patki, Daniel Ellsworth, Barry Rountree, and Martin Schulz. 2017. Production Hardware Overprovisioning: Real-World Performance Optimization Using an Extensible Power-Aware Resource Management Framework. In 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS). 957--966.
[39]
Osman Sarood, Akhil Langer, Abhishek Gupta, and Laxmikant Kale. 2014. Maximizing Throughput of Overprovisioned HPC Data Centers Under a Strict Power Budget. In SC '14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 807--818.
[40]
Osman Sarood, Akhil Langer, Laxmikant Kalé, Barry Rountree, and Bronis de Supinski. 2013. Optimizing power allocation to CPU and memory subsystems in overprovisioned HPC systems. In 2013 IEEE International Conference on Cluster Computing (CLUSTER). 1--8.
[41]
Kai Shen, Arrvindh Shriraman, Sandhya Dwarkadas, Xiao Zhang, and Zhuan Chen. 2013. Power Containers: An OS Facility for Fine-Grained Power and Energy Management on Multicore Servers. In Proceedings of the Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems (Houston, Texas, USA) (ASPLOS '13). Association for Computing Machinery, New York, NY, USA, 65--76.
[42]
David Snowdon, Etienne Sueur, Stefan Petters, and Gernot Heiser. 2009. Koala a platform for OS-level power management. Proceedings of the 4th ACM European Conference on Computer Systems, EuroSys'09, 289--302.
[43]
Tapan Srivastava, Huazhe Zhang, and Henry Hoffmann. 2023. Penelope: Peer-to-Peer Power Management. In Proceedings of the 51st International Conference on Parallel Processing (Bordeaux, France) (ICPP '22). Association for Computing Machinery, New York, NY, USA, Article 43, 11 pages.
[44]
Vibhore Vardhan, Wanghong Yuan, Albert III, Sarita Adve, Robin Kravets, Klara Nahrstedt, Daniel Sachs, and Douglas Jones. 2009. GRACE-2: Integrating fine-grained application adaptation with global adaptation for saving energy. IJES 4 (01 2009), 152--169.
[45]
Xiaorui Wang and Ming Chen. 2008. Cluster-level feedback power control for performance optimization. In 2008 IEEE 14th International Symposium on High Performance Computer Architecture. 101--110.
[46]
Yawen Wang, Daniel Crankshaw, Neeraja J. Yadwadkar, Daniel Berger, Christos Kozyrakis, and Ricardo Bianchini. 2022. SOL: Safe on-Node Learning in Cloud Platforms. In Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (Lausanne, Switzerland) (ASPLOS '22). Association for Computing Machinery, New York, NY, USA, 622--634.
[47]
Zhikui Wang, Cliff McCarthy, Xiaoyun Zhu, Partha Ranganathan, and Vanish Talwar. 2008. Feedback Control Algorithms for Power Management of Servers. (01 2008).
[48]
Andreas Weissel, Bjórn Beutel, and Frank Bellosa. 2002. Cooperative I/O: A Novel I/O Semantics for Energy-Aware Applications. In 5th Symposium on Operating Systems Design and Implementation (OSDI 02). USENIX Association, Boston, MA. https://www.usenix.org/conference/osdi-02/cooperative-io-novel-io-semantics-energy-aware-applications
[49]
Greg Welch, Gary Bishop, et al. 1995. An introduction to the Kalman filter. (1995).
[50]
Will Whiteside, Shelby Funk, Aniruddha Marathe, and Barry Rountree. 2017. PANN: Power Allocation via Neural Networks Dynamic Bounded-Power Allocation in High Performance Computing. In Proceedings of the 5th International Workshop on Energy Efficient Supercomputing (Denver, CO, USA) (E2SC'17). Association for Computing Machinery, New York, NY, USA, Article 8, 7 pages.
[51]
Andy B. Yoo, Morris A. Jette, and Mark Grondona. 2003. SLURM: Simple Linux Utility for Resource Management. In Job Scheduling Strategies for Parallel Processing, Dror Feitelson, Larry Rudolph, and Uwe Schwiegelshohn (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 44--60.
[52]
Wanghong Yuan and Klara Nahrstedt. 2003. Energy-Efficient Soft Real-Time CPU Scheduling for Mobile Multimedia Systems. In Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles (Bolton Landing, NY, USA) (SOSP '03). Association for Computing Machinery, New York, NY, USA, 149--163.
[53]
Huazhe Zhang and Henry Hoffmann. 2016. Maximizing Performance Under a Power Cap: A Comparison of Hardware, Software, and Hybrid Techniques. SIGARCH Comput. Archit. News 44, 2 (mar 2016), 545--559.
[54]
Huazhe Zhang and Henry Hoffmann. 2018. Performance & Energy Tradeoffs for Dependent Distributed Applications Under System-Wide Power Caps. In Proceedings of the 47th International Conference on Parallel Processing (Eugene, OR, USA) (ICPP 2018). Association for Computing Machinery, New York, NY, USA, Article 67, 11 pages.
[55]
Huazhe Zhang and Henry Hoffmann. 2019. PoDD: Power-Capping Dependent Distributed Applications. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (Denver, Colorado) (SC '19). Association for Computing Machinery, New York, NY, USA, Article 28, 23 pages.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SC '23: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
November 2023
1428 pages
ISBN:9798400701092
DOI:10.1145/3581784
This work is licensed under a Creative Commons Attribution-ShareAlike International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 November 2023

Check for updates

Badges

Author Tags

  1. power-efficient design and power-management strategies
  2. resource management
  3. job scheduling
  4. system interoperations and energy-aware techniques for large-scale systems

Qualifiers

  • Research-article

Funding Sources

  • Army Research Office
  • National Science Foundation CCF
  • National Science Foundation PPoSS
  • National Science Foundation CNS

Conference

SC '23
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 394
    Total Downloads
  • Downloads (Last 12 months)394
  • Downloads (Last 6 weeks)34
Reflects downloads up to 15 Sep 2024

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media