skip to main content
10.1145/3035918.3064029acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article
Public Access

Automatic Database Management System Tuning Through Large-scale Machine Learning

Published: 09 May 2017 Publication History

Abstract

Database management system (DBMS) configuration tuning is an essential aspect of any data-intensive application effort. But this is historically a difficult task because DBMSs have hundreds of configuration "knobs" that control everything in the system, such as the amount of memory to use for caches and how often data is written to storage. The problem with these knobs is that they are not standardized (i.e., two DBMSs use a different name for the same knob), not independent (i.e., changing one knob can impact others), and not universal (i.e., what works for one application may be sub-optimal for another). Worse, information about the effects of the knobs typically comes only from (expensive) experience.
To overcome these challenges, we present an automated approach that leverages past experience and collects new information to tune DBMS configurations: we use a combination of supervised and unsupervised machine learning methods to (1) select the most impactful knobs, (2) map unseen database workloads to previous workloads from which we can transfer experience, and (3) recommend knob settings. We implemented our techniques in a new tool called OtterTune and tested it on two DBMSs. Our evaluation shows that OtterTune recommends configurations that are as good as or better than ones generated by existing tools or a human expert.

References

[1]
MySQL -- InnoDB startup options and system variables. http: //dev.mysql.com/doc/refman/5.6/en/innodb-parameters.html.
[2]
MySQL Tuning Primer Script. https://launchpad.net/mysql-tuning-primer.
[3]
OLTPBenchmark.com. http://oltpbenchmark.com.
[4]
PostgreSQL Configuration Wizard. http://pgfoundry.org/projects/pgtune/.
[5]
scikit-learn Documentation -- Factor Analysis. http://scikit-learn.org/stable/modules/generated/sklearn. decomposition.FactorAnalysis.html.
[6]
scikit-learn Documentation -- KMeans. http://scikit-learn. org/stable/modules/generated/sklearn.cluster.KMeans.html.
[7]
S. Agrawal, S. Chaudhuri, and V. R. Narasayya. Automated selection of materialized views and indexes in SQL databases. In VLDB, 2000.
[8]
S. Agrawal, V. Narasayya, and B. Yang. Integrating vertical and horizontal partitioning into automated physical database design. In SIGMOD, 2004.
[9]
J. C. Barrett, D. G. Clayton, P. Concannon, B. Akolkar, J. D. Cooper, H. A. Erlich, C. Julier, G. Morahan, J. Nerup, C. Nierras, et al. Genome-wide association study and meta-analysis find that over 40 loci affect risk of type 1 diabetes. Nature genetics, 41(6):703--707, 2009.
[10]
P. Belknap, B. Dageville, K. Dias, and K. Yagoub. Self-tuning for SQL performance in Oracle Database 11g. In ICDE, pages 1694--1700, 2009.
[11]
P. Bernstein, M. Brodie, S. Ceri, D. DeWitt, M. Franklin, H. Garcia-Molina, J. Gray, J. Held, J. Hellerstein, H. Jagadish, et al. The asilomar report on database research. SIGMOD record, 27(4):74--80, 1998.
[12]
P. Boncz, T. Neumann, and O. Erling. TPC-H Analyzed: Hidden Messages and Lessons Learned from an Influential Benchmark. 2014.
[13]
K. P. Brown, M. J. Carey, and M. Livny. Goal-oriented buffer management revisited. In SIGMOD, pages 353--364, 1996.
[14]
G. Casella and R. L. Berger. Statistical Inference. Duxbury advanced series in statistics and decision sciences. Duxbury Press, 2002.
[15]
S. Chaudhuri and V. Narasayya. Autoadmin "what-if" index analysis utility. SIGMOD Rec., 27(2):367--378, 1998.
[16]
S. Chaudhuri and V. Narasayya. Self-tuning database systems: a decade of progress. In VLDB, pages 3--14, 2007.
[17]
S. Chaudhuri and V. R. Narasayya. An efficient cost-driven index selection tool for microsoft SQL server. In VLDB, pages 146--155, 1997.
[18]
B. F. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, and R. Sears. Benchmarking cloud serving systems with ycsb. In SoCC, pages 143--154, 2010.
[19]
C. Curino, E. Jones, Y. Zhang, and S. Madden. Schism: a workload-drive approach to database replication and partitioning. In VLDB, 2010.
[20]
E. Danna and L. Perron. Structured vs. unstructured large neighborhood search: A case study on job-shop scheduling problems with earliness and tardiness costs. In Principles and Practice of Constraint Programming, volume 2833, pages 817--821, 2003.
[21]
B. Debnath, D. Lilja, and M. Mokbel. SARD: A statistical approach for ranking database tuning parameters. In ICDEW, pages 11--18, 2008.
[22]
K. Dias, M. Ramacher, U. Shaft, V. Venkataramani, and G. Wood. Automatic performance diagnosis and tuning in oracle. In CIdR, 2005.
[23]
D. E. Difallah, A. Pavlo, C. Curino, and P. Cudre-Mauroux. OLTP-Bench: an extensible testbed for benchmarking relational databases. In VLDB, pages 277--288, 2013.
[24]
S. Duan, V. Thummala, and S. Babu. Tuning database configuration parameters with iTuned. VLDB, 2:1246--1257, August 2009.
[25]
D. Dworin. Data science revealed: A data-driven glimpse into the burgeoning new field. Dec. 2011.
[26]
B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani. Least angle regression. The Annals of Statistics, 32(2):407--499, 2004.
[27]
F. Focacci, F. Laburthe, and A. Lodi. Handbook of Metaheuristics, chapter Local Search and Constraint Programming. Springer, 2003.
[28]
H. Gupta, V. Harinarayan, A. Rajaraman, and J. D. Ullman. Index selection for olap. In ICDE, pages 208--219, 1997.
[29]
T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. Springer, 2001.
[30]
A. Jain, M. Murty, and P. Flynn. Data clustering: A review. volume 31, pages 264--323, 1999.
[31]
A. K. Jain and R. C. Dubes. Algorithms for clustering data. Prentice-Hall, Inc., 1988.
[32]
A. Krause and C. S. Ong. Contextual gaussian process bandit optimization. In NIPS, pages 2447--2455, 2011.
[33]
S. Kumar. Oracle Database 10g: The self-managing database, Nov. 2003. White Paper.
[34]
E. Kwan, S. Lightstone, A. Storm, and L. Wu. Automatic configuration for IBM DB2 universal database. Technical report, IBM, jan 2002.
[35]
D. Laney. 3-D data management: Controlling data volume, velocity and variety. Feb. 2001.
[36]
M. Linster. Best practices for becoming an exceptional postgres dba. http://www.enterprisedb.com/ best-practices-becoming-exceptional-postgres-dba, Aug. 2014.
[37]
D. Narayanan, E. Thereska, and A. Ailamaki. Continuous resource monitoring for self-predicting DBMS. In MASCOTS, pages 239--248, 2005.
[38]
A. Pavlo, E. P. Jones, and S. Zdonik. On predictive modeling for optimizing transaction execution in parallel OLTP systems. VLDB, 5:85--96, October 2011.
[39]
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825--2830, 2011.
[40]
D. T. Pham, S. S. Dimov, and C. D. Nguyen. Selection of k in k-means clustering. In IMechE, volume 219, 2005.
[41]
J. Rao, C. Zhang, N. Megiddo, and G. Lohman. Automating physical database design in a parallel database. In SIGMOD, pages 558--569, 2002.
[42]
C. E. Rasmussen and C. K. Williams. Gaussian Processes for Machine Learning. The MIT Press, 2006.
[43]
A. Rosenberg. Improving query performance in data warehouses. Business Intelligence Journal, 11, Jan. 2006.
[44]
A. A. Soror, U. F. Minhas, A. Aboulnaga, K. Salem, P. Kokosielis, and S. Kamath. Automatic virtual machine configuration for database workloads. In SIGMOD, pages 953--966, 2008.
[45]
N. Srinivas, A. Krause, S. Kakade, and M. Seeger. Gaussian process optimizatioin in the bandit setting: No regret and experimental design. In Proceedings of the 27th International Conference on Machine Learning, 2010.
[46]
M. Stonebraker, S. Madden, and P. Dubey. Intel "big data" science and technology center vision and execution plan. SIGMOD Rec., 42(1):44--49, May 2013.
[47]
A. J. Storm, C. Garcia-Arellano, S. S. Lightstone, Y. Diao, and M. Surendra. Adaptive self-tuning memory in DB2. In VLDB, pages 1081--1092, 2006.
[48]
C. Sugar. Techniques for clustering and classification with applications to medical problems. PhD thesis, Stanford University, 1998.
[49]
D. G. Sullivan, M. I. Seltzer, and A. Pfeffer. Using probabilistic reasoning to automate software tuning. SIGMETRICS, pages 404--405, 2004.
[50]
M. A. et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems. CoRR, abs/1603.04467, 2016.
[51]
The Transaction Processing Council. TPC-C Benchmark (Revision 5.9.0). http://www.tpc.org/tpcc/spec/tpcc_current.pdf, June 2007.
[52]
The Transaction Processing Council. TPC-H Benchmark (Revision 2.16.0). http://www.tpc.org/tpch/spec/tpch2.16.0.pdf, December 2013.
[53]
W. Tian, P. Martin, and W. Powley. Techniques for automatically sizing multiple buffer pools in DB2. In CASCON, pages 294--302, 2003.
[54]
R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), 58:267--288, 1996.
[55]
R. Tibshirani, G. Walther, and T. Hastie. Estimating the number of clusters in a data set via the gap statistic. Journal of the Royal Statistical Society. Series B (Statistical Methodology), 63:411--423, 2001.
[56]
R. J. Tibshirani, A. Rinaldo, R. Tibshirani, and L. Wasserman. Uniform asymptotic inference and the bootstrap after model selection. arXiv preprint arXiv:1506.06266, 2015.
[57]
R. J. Tibshirani, J. Taylor, R. Lockhart, and R. Tibshirani. Exact post-selection inference for sequential regression procedures. arXiv preprint arXiv:1401.3889, 2014.
[58]
D. N. Tran, P. C. Huynh, Y. C. Tay, and A. K. H. Tung. A new approach to dynamic self-tuning of database buffers. Trans. Storage, 4(1):3:1--3:25, May 2008.
[59]
G. Valentin, M. Zuliani, D. Zilio, G. Lohman, and A. Skelley. DB2 advisor: an optimizer smart enough to recommend its own indexes. In ICDE, pages 101--110, 2000.
[60]
G. Weikum, C. Hasse, A. Mönkeberg, and P. Zabback. The COMFORT automatic tuning project. Information Systems, 19(5):381--432, July 1994.
[61]
B. Xi, Z. Liu, M. Raghavachari, C. H. Xia, and L. Zhang. A smart hill-climbing algorithm for application server configuration. In WWW, pages 287--296, 2004.
[62]
K. Yagoub, P. Belknap, B. Dageville, K. Dias, S. Joshi, and H. Yu. Oracle's sql performance analyzer. IEEE Data Engineering Bulletin, 31(1), 2008.
[63]
D. Y. Yoon, N. Niu, and B. Mozafari. DBSherlock: a performance diagnostic tool for transactional databases. In SIGMOD, pages 1599--1614, 2016.
[64]
C. Zhang, A. Kumar, and C. Ré. Materialization optimizations for feature selection workloads. In SIGMOD, pages 265--276, 2014.
[65]
D. C. Zilio. Physical Database Design Decision Algorithms and Concurrent Reorganization for Parallel Database Systems. PhD thesis, University of Toronto, 1998.

Cited By

View all
  • (2024)Hit the Gym: Accelerating Query Execution to Efficiently Bootstrap Behavior Models for Self-Driving Database Management SystemsProceedings of the VLDB Endowment10.14778/3681954.368203017:11(3680-3693)Online publication date: 1-Jul-2024
  • (2024)Blueprinting the Cloud: Unifying and Automatically Optimizing Cloud Data Infrastructures with BRADProceedings of the VLDB Endowment10.14778/3681954.368202617:11(3629-3643)Online publication date: 1-Jul-2024
  • (2024)A Spark Optimizer for Adaptive, Fine-Grained Parameter TuningProceedings of the VLDB Endowment10.14778/3681954.368202117:11(3565-3579)Online publication date: 1-Jul-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '17: Proceedings of the 2017 ACM International Conference on Management of Data
May 2017
1810 pages
ISBN:9781450341974
DOI:10.1145/3035918
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 May 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. autonomic computing
  2. database management systems
  3. database tuning
  4. machine learning

Qualifiers

  • Research-article

Funding Sources

Conference

SIGMOD/PODS'17
Sponsor:

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1,731
  • Downloads (Last 6 weeks)183
Reflects downloads up to 21 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Hit the Gym: Accelerating Query Execution to Efficiently Bootstrap Behavior Models for Self-Driving Database Management SystemsProceedings of the VLDB Endowment10.14778/3681954.368203017:11(3680-3693)Online publication date: 1-Jul-2024
  • (2024)Blueprinting the Cloud: Unifying and Automatically Optimizing Cloud Data Infrastructures with BRADProceedings of the VLDB Endowment10.14778/3681954.368202617:11(3629-3643)Online publication date: 1-Jul-2024
  • (2024)A Spark Optimizer for Adaptive, Fine-Grained Parameter TuningProceedings of the VLDB Endowment10.14778/3681954.368202117:11(3565-3579)Online publication date: 1-Jul-2024
  • (2024)The Holon Approach for Simultaneously Tuning Multiple Components in a Self-Driving Database Management System with Machine Learning via Synthesized Proto-ActionsProceedings of the VLDB Endowment10.14778/3681954.368200717:11(3373-3387)Online publication date: 30-Aug-2024
  • (2024)GPTuner: A Manual-Reading Database Tuning System via GPT-Guided Bayesian OptimizationProceedings of the VLDB Endowment10.14778/3659437.365944917:8(1939-1952)Online publication date: 1-Apr-2024
  • (2024)Towards Full Stack Adaptivity in Permissioned BlockchainsProceedings of the VLDB Endowment10.14778/3641204.364121617:5(1073-1080)Online publication date: 1-Jan-2024
  • (2024)PilotScope: Steering Databases with Machine Learning DriversProceedings of the VLDB Endowment10.14778/3641204.364120917:5(980-993)Online publication date: 1-Jan-2024
  • (2024)CTuner: Automatic NoSQL Database Tuning with Causal Reinforcement LearningProceedings of the 15th Asia-Pacific Symposium on Internetware10.1145/3671016.3674809(269-278)Online publication date: 24-Jul-2024
  • (2024)KnobTune: A Dynamic Database Configuration Tuning Strategy Leveraging Historical Workload SimilaritiesProceedings of the International Conference on Computing, Machine Learning and Data Science10.1145/3661725.3661734(1-8)Online publication date: 12-Apr-2024
  • (2024)Nautilus: A Benchmarking Platform for DBMS Knob TuningProceedings of the Eighth Workshop on Data Management for End-to-End Machine Learning10.1145/3650203.3663336(72-76)Online publication date: 9-Jun-2024
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media