research-article

Public Access

Automatic Database Management System Tuning Through Large-scale Machine Learning

Authors:

Geoffrey J. Gordon,

Bohan ZhangAuthors Info & Claims

SIGMOD '17: Proceedings of the 2017 ACM International Conference on Management of Data

Pages 1009 - 1024

https://doi.org/10.1145/3035918.3064029

Published: 09 May 2017 Publication History

Abstract

Database management system (DBMS) configuration tuning is an essential aspect of any data-intensive application effort. But this is historically a difficult task because DBMSs have hundreds of configuration "knobs" that control everything in the system, such as the amount of memory to use for caches and how often data is written to storage. The problem with these knobs is that they are not standardized (i.e., two DBMSs use a different name for the same knob), not independent (i.e., changing one knob can impact others), and not universal (i.e., what works for one application may be sub-optimal for another). Worse, information about the effects of the knobs typically comes only from (expensive) experience.

To overcome these challenges, we present an automated approach that leverages past experience and collects new information to tune DBMS configurations: we use a combination of supervised and unsupervised machine learning methods to (1) select the most impactful knobs, (2) map unseen database workloads to previous workloads from which we can transfer experience, and (3) recommend knob settings. We implemented our techniques in a new tool called OtterTune and tested it on two DBMSs. Our evaluation shows that OtterTune recommends configurations that are as good as or better than ones generated by existing tools or a human expert.

References

[1]

MySQL -- InnoDB startup options and system variables. http: //dev.mysql.com/doc/refman/5.6/en/innodb-parameters.html.

[2]

MySQL Tuning Primer Script. https://launchpad.net/mysql-tuning-primer.

[3]

OLTPBenchmark.com. http://oltpbenchmark.com.

[4]

PostgreSQL Configuration Wizard. http://pgfoundry.org/projects/pgtune/.

[5]

scikit-learn Documentation -- Factor Analysis. http://scikit-learn.org/stable/modules/generated/sklearn. decomposition.FactorAnalysis.html.

[6]

scikit-learn Documentation -- KMeans. http://scikit-learn. org/stable/modules/generated/sklearn.cluster.KMeans.html.

[7]

S. Agrawal, S. Chaudhuri, and V. R. Narasayya. Automated selection of materialized views and indexes in SQL databases. In VLDB, 2000.

Digital Library

[8]

S. Agrawal, V. Narasayya, and B. Yang. Integrating vertical and horizontal partitioning into automated physical database design. In SIGMOD, 2004.

Digital Library

[9]

J. C. Barrett, D. G. Clayton, P. Concannon, B. Akolkar, J. D. Cooper, H. A. Erlich, C. Julier, G. Morahan, J. Nerup, C. Nierras, et al. Genome-wide association study and meta-analysis find that over 40 loci affect risk of type 1 diabetes. Nature genetics, 41(6):703--707, 2009.

[10]

P. Belknap, B. Dageville, K. Dias, and K. Yagoub. Self-tuning for SQL performance in Oracle Database 11g. In ICDE, pages 1694--1700, 2009.

Digital Library

[11]

P. Bernstein, M. Brodie, S. Ceri, D. DeWitt, M. Franklin, H. Garcia-Molina, J. Gray, J. Held, J. Hellerstein, H. Jagadish, et al. The asilomar report on database research. SIGMOD record, 27(4):74--80, 1998.

Digital Library

[12]

P. Boncz, T. Neumann, and O. Erling. TPC-H Analyzed: Hidden Messages and Lessons Learned from an Influential Benchmark. 2014.

[13]

K. P. Brown, M. J. Carey, and M. Livny. Goal-oriented buffer management revisited. In SIGMOD, pages 353--364, 1996.

Digital Library

[14]

G. Casella and R. L. Berger. Statistical Inference. Duxbury advanced series in statistics and decision sciences. Duxbury Press, 2002.

[15]

S. Chaudhuri and V. Narasayya. Autoadmin "what-if" index analysis utility. SIGMOD Rec., 27(2):367--378, 1998.

Digital Library

[16]

S. Chaudhuri and V. Narasayya. Self-tuning database systems: a decade of progress. In VLDB, pages 3--14, 2007.

Digital Library

[17]

S. Chaudhuri and V. R. Narasayya. An efficient cost-driven index selection tool for microsoft SQL server. In VLDB, pages 146--155, 1997.

Digital Library

[18]

B. F. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, and R. Sears. Benchmarking cloud serving systems with ycsb. In SoCC, pages 143--154, 2010.

Digital Library

[19]

C. Curino, E. Jones, Y. Zhang, and S. Madden. Schism: a workload-drive approach to database replication and partitioning. In VLDB, 2010.

Digital Library

[20]

E. Danna and L. Perron. Structured vs. unstructured large neighborhood search: A case study on job-shop scheduling problems with earliness and tardiness costs. In Principles and Practice of Constraint Programming, volume 2833, pages 817--821, 2003.

[21]

B. Debnath, D. Lilja, and M. Mokbel. SARD: A statistical approach for ranking database tuning parameters. In ICDEW, pages 11--18, 2008.

Digital Library

[22]

K. Dias, M. Ramacher, U. Shaft, V. Venkataramani, and G. Wood. Automatic performance diagnosis and tuning in oracle. In CIdR, 2005.

[23]

D. E. Difallah, A. Pavlo, C. Curino, and P. Cudre-Mauroux. OLTP-Bench: an extensible testbed for benchmarking relational databases. In VLDB, pages 277--288, 2013.

Digital Library

[24]

S. Duan, V. Thummala, and S. Babu. Tuning database configuration parameters with iTuned. VLDB, 2:1246--1257, August 2009.

Digital Library

[25]

D. Dworin. Data science revealed: A data-driven glimpse into the burgeoning new field. Dec. 2011.

[26]

B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani. Least angle regression. The Annals of Statistics, 32(2):407--499, 2004.

[27]

F. Focacci, F. Laburthe, and A. Lodi. Handbook of Metaheuristics, chapter Local Search and Constraint Programming. Springer, 2003.

[28]

H. Gupta, V. Harinarayan, A. Rajaraman, and J. D. Ullman. Index selection for olap. In ICDE, pages 208--219, 1997.

Digital Library

[29]

T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. Springer, 2001.

[30]

A. Jain, M. Murty, and P. Flynn. Data clustering: A review. volume 31, pages 264--323, 1999.

Digital Library

[31]

A. K. Jain and R. C. Dubes. Algorithms for clustering data. Prentice-Hall, Inc., 1988.

Digital Library

[32]

A. Krause and C. S. Ong. Contextual gaussian process bandit optimization. In NIPS, pages 2447--2455, 2011.

Digital Library

[33]

S. Kumar. Oracle Database 10g: The self-managing database, Nov. 2003. White Paper.

[34]

E. Kwan, S. Lightstone, A. Storm, and L. Wu. Automatic configuration for IBM DB2 universal database. Technical report, IBM, jan 2002.

[35]

D. Laney. 3-D data management: Controlling data volume, velocity and variety. Feb. 2001.

[36]

M. Linster. Best practices for becoming an exceptional postgres dba. http://www.enterprisedb.com/ best-practices-becoming-exceptional-postgres-dba, Aug. 2014.

[37]

D. Narayanan, E. Thereska, and A. Ailamaki. Continuous resource monitoring for self-predicting DBMS. In MASCOTS, pages 239--248, 2005.

Digital Library

[38]

A. Pavlo, E. P. Jones, and S. Zdonik. On predictive modeling for optimizing transaction execution in parallel OLTP systems. VLDB, 5:85--96, October 2011.

Digital Library

[39]

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825--2830, 2011.

Digital Library

[40]

D. T. Pham, S. S. Dimov, and C. D. Nguyen. Selection of k in k-means clustering. In IMechE, volume 219, 2005.

[41]

J. Rao, C. Zhang, N. Megiddo, and G. Lohman. Automating physical database design in a parallel database. In SIGMOD, pages 558--569, 2002.

Digital Library

[42]

C. E. Rasmussen and C. K. Williams. Gaussian Processes for Machine Learning. The MIT Press, 2006.

Digital Library

[43]

A. Rosenberg. Improving query performance in data warehouses. Business Intelligence Journal, 11, Jan. 2006.

[44]

A. A. Soror, U. F. Minhas, A. Aboulnaga, K. Salem, P. Kokosielis, and S. Kamath. Automatic virtual machine configuration for database workloads. In SIGMOD, pages 953--966, 2008.

Digital Library

[45]

N. Srinivas, A. Krause, S. Kakade, and M. Seeger. Gaussian process optimizatioin in the bandit setting: No regret and experimental design. In Proceedings of the 27th International Conference on Machine Learning, 2010.

Digital Library

[46]

M. Stonebraker, S. Madden, and P. Dubey. Intel "big data" science and technology center vision and execution plan. SIGMOD Rec., 42(1):44--49, May 2013.

Digital Library

[47]

A. J. Storm, C. Garcia-Arellano, S. S. Lightstone, Y. Diao, and M. Surendra. Adaptive self-tuning memory in DB2. In VLDB, pages 1081--1092, 2006.

Digital Library

[48]

C. Sugar. Techniques for clustering and classification with applications to medical problems. PhD thesis, Stanford University, 1998.

[49]

D. G. Sullivan, M. I. Seltzer, and A. Pfeffer. Using probabilistic reasoning to automate software tuning. SIGMETRICS, pages 404--405, 2004.

Digital Library

[50]

M. A. et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems. CoRR, abs/1603.04467, 2016.

[51]

The Transaction Processing Council. TPC-C Benchmark (Revision 5.9.0). http://www.tpc.org/tpcc/spec/tpcc_current.pdf, June 2007.

[52]

The Transaction Processing Council. TPC-H Benchmark (Revision 2.16.0). http://www.tpc.org/tpch/spec/tpch2.16.0.pdf, December 2013.

[53]

W. Tian, P. Martin, and W. Powley. Techniques for automatically sizing multiple buffer pools in DB2. In CASCON, pages 294--302, 2003.

Digital Library

[54]

R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), 58:267--288, 1996.

[55]

R. Tibshirani, G. Walther, and T. Hastie. Estimating the number of clusters in a data set via the gap statistic. Journal of the Royal Statistical Society. Series B (Statistical Methodology), 63:411--423, 2001.

[56]

R. J. Tibshirani, A. Rinaldo, R. Tibshirani, and L. Wasserman. Uniform asymptotic inference and the bootstrap after model selection. arXiv preprint arXiv:1506.06266, 2015.

[57]

R. J. Tibshirani, J. Taylor, R. Lockhart, and R. Tibshirani. Exact post-selection inference for sequential regression procedures. arXiv preprint arXiv:1401.3889, 2014.

[58]

D. N. Tran, P. C. Huynh, Y. C. Tay, and A. K. H. Tung. A new approach to dynamic self-tuning of database buffers. Trans. Storage, 4(1):3:1--3:25, May 2008.

Digital Library

[59]

G. Valentin, M. Zuliani, D. Zilio, G. Lohman, and A. Skelley. DB2 advisor: an optimizer smart enough to recommend its own indexes. In ICDE, pages 101--110, 2000.

Digital Library

[60]

G. Weikum, C. Hasse, A. Mönkeberg, and P. Zabback. The COMFORT automatic tuning project. Information Systems, 19(5):381--432, July 1994.

Digital Library

[61]

B. Xi, Z. Liu, M. Raghavachari, C. H. Xia, and L. Zhang. A smart hill-climbing algorithm for application server configuration. In WWW, pages 287--296, 2004.

Digital Library

[62]

K. Yagoub, P. Belknap, B. Dageville, K. Dias, S. Joshi, and H. Yu. Oracle's sql performance analyzer. IEEE Data Engineering Bulletin, 31(1), 2008.

[63]

D. Y. Yoon, N. Niu, and B. Mozafari. DBSherlock: a performance diagnostic tool for transactional databases. In SIGMOD, pages 1599--1614, 2016.

Digital Library

[64]

C. Zhang, A. Kumar, and C. Ré. Materialization optimizations for feature selection workloads. In SIGMOD, pages 265--276, 2014.

Digital Library

[65]

D. C. Zilio. Physical Database Design Decision Algorithms and Concurrent Reorganization for Parallel Database Systems. PhD thesis, University of Toronto, 1998.

Digital Library

Cited By

Lim WMa LZhang WButrovich MArch SPavlo A(2024)Hit the Gym: Accelerating Query Execution to Efficiently Bootstrap Behavior Models for Self-Driving Database Management SystemsProceedings of the VLDB Endowment10.14778/3681954.368203017:11(3680-3693)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.14778/3681954.3682030
Yu GWu ZKossmann FLi TMarkakis MNgom AMadden SKraska T(2024)Blueprinting the Cloud: Unifying and Automatically Optimizing Cloud Data Infrastructures with BRADProceedings of the VLDB Endowment10.14778/3681954.368202617:11(3629-3643)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.14778/3681954.3682026
Lyu CFan QGuyard PDiao Y(2024)A Spark Optimizer for Adaptive, Fine-Grained Parameter TuningProceedings of the VLDB Endowment10.14778/3681954.368202117:11(3565-3579)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.14778/3681954.3682021
Show More Cited By

Index Terms

Automatic Database Management System Tuning Through Large-scale Machine Learning
1. Information systems
  1. Data management systems
    1. Database administration
      1. Autonomous database administration
      2. Database utilities and tools
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Machine learning theory

Recommendations

A demonstration of the ottertune automatic database management system tuning service

Database management systems (DBMSs) have a plethora of tunable knobs that control almost everything in the system. The performance of a DBMS is highly dependent on these configuration knobs, however, getting this tuning right is hard. Many organizations ...
An End-to-End Automatic Cloud Database Tuning System Using Deep Reinforcement Learning
SIGMOD '19: Proceedings of the 2019 International Conference on Management of Data

Configuration tuning is vital to optimize the performance of database management system (DBMS). It becomes more tedious and urgent for cloud databases (CDB) due to the diverse database instances and query workloads, which make the database administrator ...
Policy-Based Coordination of Best-Practice Oriented Autonomic Database Tuning
COMPUTATIONWORLD '09: Proceedings of the 2009 Computation World: Future Computing, Service Computation, Cognitive, Adaptive, Content, Patterns

In the last decades databases have been growing rapidly in scale and complexity. High performance, availability and further service level agreements need to be satisfied under any circumstances to please customers. Achieved database management system’s (...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGMOD '17: Proceedings of the 2017 ACM International Conference on Management of Data

May 2017

1810 pages

ISBN:9781450341974

DOI:10.1145/3035918

General Chairs:
Rada Chirkova
North Carolina State University, USA
,
Jun Yang
Duke University, USA
,
Program Chair:
Dan Suciu
University of Washington, USA

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMOD: ACM Special Interest Group on Management of Data

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 May 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Science Foundation

Conference

SIGMOD/PODS'17

Sponsor:

SIGMOD

SIGMOD/PODS'17: International Conference on Management of Data

May 14 - 19, 2017

Illinois, Chicago, USA

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

310
Total Citations
View Citations
9,271
Total Downloads

Downloads (Last 12 months)1,731
Downloads (Last 6 weeks)183

Reflects downloads up to 21 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Lim WMa LZhang WButrovich MArch SPavlo A(2024)Hit the Gym: Accelerating Query Execution to Efficiently Bootstrap Behavior Models for Self-Driving Database Management SystemsProceedings of the VLDB Endowment10.14778/3681954.368203017:11(3680-3693)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.14778/3681954.3682030
Yu GWu ZKossmann FLi TMarkakis MNgom AMadden SKraska T(2024)Blueprinting the Cloud: Unifying and Automatically Optimizing Cloud Data Infrastructures with BRADProceedings of the VLDB Endowment10.14778/3681954.368202617:11(3629-3643)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.14778/3681954.3682026
Lyu CFan QGuyard PDiao Y(2024)A Spark Optimizer for Adaptive, Fine-Grained Parameter TuningProceedings of the VLDB Endowment10.14778/3681954.368202117:11(3565-3579)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.14778/3681954.3682021
Zhang WLim WButrovich MPavlo A(2024)The Holon Approach for Simultaneously Tuning Multiple Components in a Self-Driving Database Management System with Machine Learning via Synthesized Proto-ActionsProceedings of the VLDB Endowment10.14778/3681954.368200717:11(3373-3387)Online publication date: 30-Aug-2024
https://doi.org/10.14778/3681954.3682007
Lao JWang YLi YWang JZhang YCheng ZChen WTang MWang J(2024)GPTuner: A Manual-Reading Database Tuning System via GPT-Guided Bayesian OptimizationProceedings of the VLDB Endowment10.14778/3659437.365944917:8(1939-1952)Online publication date: 1-Apr-2024
https://dl.acm.org/doi/10.14778/3659437.3659449
Wu CAmiri MQin HMehta BMarcus RLoo B(2024)Towards Full Stack Adaptivity in Permissioned BlockchainsProceedings of the VLDB Endowment10.14778/3641204.364121617:5(1073-1080)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.14778/3641204.3641216
Zhu RWeng LWei WWu DPeng JWang YDing BLian DZheng BZhou J(2024)PilotScope: Steering Databases with Machine Learning DriversProceedings of the VLDB Endowment10.14778/3641204.364120917:5(980-993)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.14778/3641204.3641209
Mai GHe ZYu GChen ZChen P(2024)CTuner: Automatic NoSQL Database Tuning with Causal Reinforcement LearningProceedings of the 15th Asia-Pacific Symposium on Internetware10.1145/3671016.3674809(269-278)Online publication date: 24-Jul-2024
https://dl.acm.org/doi/10.1145/3671016.3674809
Zhan YXi RLiao JFan SHou M(2024)KnobTune: A Dynamic Database Configuration Tuning Strategy Leveraging Historical Workload SimilaritiesProceedings of the International Conference on Computing, Machine Learning and Data Science10.1145/3661725.3661734(1-8)Online publication date: 12-Apr-2024
https://dl.acm.org/doi/10.1145/3661725.3661734
Kanellis KFreischuetz JVenkataraman S(2024)Nautilus: A Benchmarking Platform for DBMS Knob TuningProceedings of the Eighth Workshop on Data Management for End-to-End Machine Learning10.1145/3650203.3663336(72-76)Online publication date: 9-Jun-2024
https://dl.acm.org/doi/10.1145/3650203.3663336
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents