Parallelization and Auto-scheduling of Data Access Queries in ML Workloads

Bratek, Pawel; Szustak, Lukasz; Zola, Jaroslaw

doi:10.1007/978-3-031-06156-1_43

Pawel Bratek¹⁸,
Lukasz Szustak¹⁸ &
Jaroslaw Zola¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13098))

Included in the following conference series:

European Conference on Parallel Processing

772 Accesses
1 Citations

Abstract

We propose an auto-scheduling mechanism to execute counting queries in machine learning applications. Our approach improves the runtime efficiency of query streams by selecting, in the on-line manner, the optimal execution strategy for each query. We also discuss how to scale up counting queries in multi-threaded applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: JPY 3498; Price includes VAT (Japan)

eBook: JPY 10295; Price includes VAT (Japan)

Softcover Book: JPY 12869; Price includes VAT (Japan)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Blink: Lightweight Sample Runs for Cost Optimization of Big Data Applications

On combining system and machine learning performance tuning for distributed data stream applications

Article 17 May 2023

Scalability and Realtime on Big Data, MapReduce, NoSQL and Spark

References

Karan, S., Eichhorn, M., Hurlburt, B., Iraci, G., Zola, J.: Fast counting in machine learning applications. In: Uncertainty in Artificial Intelligence (2018)
Google Scholar
Kohavi, R.: Scaling up the accuracy of Naive-Bayes classifiers: a decision-tree hybrid. In: International Conference on Knowledge Discovery and Data Mining, pp. 202–207 (1996)
Google Scholar
Moore, A., Lee, M.: Cached sufficient statistics for efficient machine learning with large datasets. J. Artif. Intell. Res. 8, 67–91 (1998)
Article MathSciNet Google Scholar
Quinlan, J.: Bagging, boosting, and c4.5. In: AAAI Innovative Applications of Artificial Intelligence Conferences, pp. 725–730 (1996)
Google Scholar
Ramos, J.: Using TF-IDF to determine word relevance in document queries. In: Instructional Conference on Machine Learning, pp. 133–142 (2003)
Google Scholar
Salakhutdinov, R., Hinton, G.: Deep Boltzmann machines. In: International Conference on Artificial Intelligence and Statistics, pp. 448–455 (2009)
Google Scholar

Download references

Acknowledgments

This research was supported by the National Science Centre (Poland) under grant no. UMO-2017/26/D/ST6/00687.

Author information

Authors and Affiliations

Czestochowa University of Technology, Dabrowskiego 69, 42-201, Czestochowa, Poland
Pawel Bratek & Lukasz Szustak
University at Buffalo, Buffalo, NY, 14260, USA
Jaroslaw Zola

Authors

Pawel Bratek
View author publications
You can also search for this author in PubMed Google Scholar
Lukasz Szustak
View author publications
You can also search for this author in PubMed Google Scholar
Jaroslaw Zola
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pawel Bratek .

Editor information

Editors and Affiliations

University of Lisbon, Lisbon, Portugal
Ricardo Chaves
Department of Computer Engineering, CiTIUS, University of Santiago de Compostela, Santiago de Compostela, La Coruña, Spain
Dora B. Heras
University of Lisbon, Lisbon, Portugal
Aleksandar Ilic
Koç University, Istanbul, Turkey
Didem Unat
Barcelona Supercomputing Center, Barcelona, Spain
Rosa M. Badia
University of Stirling, Stirling, UK
Andrea Bracciali
Louisiana State University, Baton Rouge, USA
Patrick Diehl
Mathematics and Computer Science, Argonne National Laboratory, Lemont, IL, USA
Anshu Dubey
Ajou University, Suwon, Korea (Republic of)
Oh Sangyoon
Tennessee Technological University, Cookeville, TN, USA
Stephen L. Scott
University of Pisa, Pisa, Italy
Laura Ricci

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bratek, P., Szustak, L., Zola, J. (2022). Parallelization and Auto-scheduling of Data Access Queries in ML Workloads. In: Chaves, R., et al. Euro-Par 2021: Parallel Processing Workshops. Euro-Par 2021. Lecture Notes in Computer Science, vol 13098. Springer, Cham. https://doi.org/10.1007/978-3-031-06156-1_43

Download citation

DOI: https://doi.org/10.1007/978-3-031-06156-1_43
Published: 09 June 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-06155-4
Online ISBN: 978-3-031-06156-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Parallelization and Auto-scheduling of Data Access Queries in ML Workloads

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Blink: Lightweight Sample Runs for Cost Optimization of Big Data Applications

On combining system and machine learning performance tuning for distributed data stream applications

Scalability and Realtime on Big Data, MapReduce, NoSQL and Spark

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Parallelization and Auto-scheduling of Data Access Queries in ML Workloads

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Blink: Lightweight Sample Runs for Cost Optimization of Big Data Applications

On combining system and machine learning performance tuning for distributed data stream applications

Scalability and Realtime on Big Data, MapReduce, NoSQL and Spark

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation