skip to main content
10.1145/3583131.3590364acmconferencesArticle/Chapter ViewAbstractPublication PagesgeccoConference Proceedingsconference-collections
research-article

Hybridizing TPOT with Bayesian Optimization

Published: 12 July 2023 Publication History

Abstract

Tree-based pipeline optimization tool (TPOT) is used to automatically construct and optimize machine learning pipelines for classification or regression tasks. The pipelines are represented as trees comprising multiple data transformation and machine learning operators --- each using discrete hyper-parameter spaces --- and optimized with genetic programming. During the evolution process, TPOT evaluates numerous pipelines which can be challenging when computing budget is limited. In this study, we integrate TPOT with Bayesian Optimization (BO) to extend its ability to search across continuous hyper-parameter spaces, and attempt to improve its performance when there is a limited computational budget. Multiple hybrid variants are proposed and systematically evaluated, including (a) sequential/periodic use of BO and (b) use of discrete/continuous search spaces for BO. The performance of these variants is assessed using 6 data sets with up to 20 features and 20,000 samples. Furthermore, an adaptive variant was designed where the choice of whether to apply TPOT or BO is made automatically in each generation. While the variants did not produce results that are significantly better than "standard" TPOT, the study uncovered important insights into the behavior and limitations of TPOT itself which is valuable in designing improved variants.

References

[1]
Takuya Akiba, Shotaro Sano, Toshihiko Yanase, Takeru Ohta, and Masanori Koyama. 2019. Optuna: A Next-generation hyperparameter optimization framework. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2623--2631.
[2]
James Bergstra, Rémi Bardenet, Yoshua Bengio, and Balázs Kégl. 2011. Algorithms for hyper-parameter optimization. Advances in Neural Information Processing Systems 24 (2011), 1--9.
[3]
James Bergstra and Yoshua Bengio. 2012. Random search for hyper-parameter optimization. Journal of Machine Learning Research 13, 2 (2012), 281--305.
[4]
James Bergstra, Daniel Yamins, and David Cox. 2013. Making a science of model search: Hyperparameter optimization in hundreds of dimensions for vision architectures. In International Conference on Machine Learning. 115--123.
[5]
François-Michel De Rainville, Félix-Antoine Fortin, Marc-André Gardner, Marc Parizeau, and Christian Gagné. 2012. DEAP: A Python framework for evolutionary algorithms. In Proceedings of the 14th annual conference companion on Genetic and evolutionary computation. 85--92.
[6]
Matthias Feurer, Aaron Klein, Katharina Eggensperger, Jost Springenberg, Manuel Blum, and Frank Hutter. 2015. Efficient and robust automated machine learning. In Advances in Neural Information Processing Systems 28 (2015). 2962--2970.
[7]
Pieter Gijsbers, Marcos LP Bueno, Stefan Coors, Erin LeDell, Sébastien Poirier, Janek Thomas, Bernd Bischl, and Joaquin Vanschoren. 2022. AMLB: an AutoML benchmark. arXiv preprint arXiv:2207.12560 (2022).
[8]
Frank Hutter, Lars Kotthoff, and Joaquin Vanschoren. 2019. Automated machine learning: methods, systems, challenges. Springer Nature.
[9]
Donald R Jones, Matthias Schonlau, and William J Welch. 1998. Efficient global optimization of expensive black-box functions. Journal of Global optimization 13, 4 (1998), 455--492.
[10]
Zohar Karnin, Tomer Koren, and Oren Somekh. 2013. Almost optimal exploration in multi-armed bandits. In International Conference on Machine Learning. 1238-- 1246.
[11]
Lars Kotthoff, Chris Thornton, Holger H Hoos, Frank Hutter, and Kevin Leyton-Brown. 2019. Auto-WEKA: Automatic model selection and hyperparameter optimization in WEKA. In Automated Machine Learning. Springer, Cham, 81--95.
[12]
Henry B Mann and Donald R Whitney. 1947. On a test of whether one of two random variables is stochastically larger than the other. The Annals of Mathematical Statistics (1947), 50--60.
[13]
Randal S Olson, Nathan Bartley, Ryan J Urbanowicz, and Jason H Moore. 2016. Evaluation of a tree-based pipeline optimization tool for automating data science. In Proceedings of the genetic and evolutionary computation conference 2016. 485-- 492.
[14]
Jasper Snoek, Hugo Larochelle, and Ryan P Adams. 2012. Practical bayesian optimization of machine learning algorithms. In Advances in Neural Information Processing Systems. 2951--2959.
[15]
P. Stetsenko. 2020. Machine learning with Python and H2O. http://docs.h2o.ai/h2o/latest-stable/h2o-docs/booklets/PythonBooklet.pdf
[16]
Anh Truong, Austin Walters, Jeremy Goodsitt, Keegan Hines, C Bayan Bruss, and Reza Farivar. 2019. Towards automated machine learning: Evaluation and comparison of AutoML approaches and tools. In 2019 IEEE 31st International Conference on Tools with Artificial Intelligence (ICTAI). IEEE, 1471--1479.
[17]
Xilu Wang, Yaochu Jin, Sebastian Schmitt, and Markus Olhofer. 2022. Recent advances in Bayesian optimization. arXiv preprint arXiv:2206.03301 (2022).

Cited By

View all
  • (2024)Using Bayesian Optimization to Improve Hyperparameter Search in TPOTProceedings of the Genetic and Evolutionary Computation Conference10.1145/3638529.3654061(340-348)Online publication date: 14-Jul-2024
  • (2024)A Hierarchical Dissimilarity Metric for Automated Machine Learning Pipelines, and Visualizing Search BehaviourApplications of Evolutionary Computation10.1007/978-3-031-56855-8_7(115-129)Online publication date: 3-Mar-2024

Index Terms

  1. Hybridizing TPOT with Bayesian Optimization

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    GECCO '23: Proceedings of the Genetic and Evolutionary Computation Conference
    July 2023
    1667 pages
    ISBN:9798400701191
    DOI:10.1145/3583131
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 12 July 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Research-article

    Funding Sources

    • Honda Research Institute Europe GmbH

    Conference

    GECCO '23
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,669 of 4,410 submissions, 38%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)60
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 15 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Using Bayesian Optimization to Improve Hyperparameter Search in TPOTProceedings of the Genetic and Evolutionary Computation Conference10.1145/3638529.3654061(340-348)Online publication date: 14-Jul-2024
    • (2024)A Hierarchical Dissimilarity Metric for Automated Machine Learning Pipelines, and Visualizing Search BehaviourApplications of Evolutionary Computation10.1007/978-3-031-56855-8_7(115-129)Online publication date: 3-Mar-2024

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media