skip to main content
article

A comparison of knives for bread slicing

Published: 01 April 2013 Publication History

Abstract

Vertical partitioning is a crucial step in physical database design in row-oriented databases. A number of vertical partitioning algorithms have been proposed over the last three decades for a variety of niche scenarios. In principle, the underlying problem remains the same: decompose a table into one or more vertical partitions. However, it is not clear how good different vertical partitioning algorithms are in comparison to each other. In fact, it is not even clear how to experimentally compare different vertical partitioning algorithms. In this paper, we present an exhaustive experimental study of several vertical partitioning algorithms. We categorize vertical partitioning algorithms along three dimensions. We survey six vertical partitioning algorithms and discuss their pros and cons. We identify the major differences in the use-case settings for different algorithms and describe how to make an apples-to-apples comparison of different vertical partitioning algorithms under the same setting. We propose four metrics to compare vertical partitioning algorithms. We show experimental results from the TPC-H and SSB benchmark and present four key lessons learned: (1) we can do four orders of magnitude less computation and still find the optimal layouts, (2) the benefits of vertical partitioning depend strongly on the database buffer size, (3) HillClimb is the best vertical partitioning algorithm, and (4) vertical partitioning for TPC-H-like benchmarks can improve over column layout by only up to 5%.

References

[1]
S. Agrawal, V. Narasayya, and B. Yang. Integrating Vertical and Horizontal Partitioning Into Automated Physical Database Design. In ACM SIGMOD, pages 359-370, 2004.
[2]
Bonnie++, coker.com.au/bonnie++.
[3]
W. W. Chu and I. T. Ieong. A Transaction-Based Approach to Vertical Partitioning for Relational Database Systems. IEEE Trans. Softw. Eng., 19(8):804-812, 1993.
[4]
D. W. Cornell and P. S. Yu. A Vertical Partitioning Algorithm for Relational Databases. In ICDE, pages 30-35, 1987.
[5]
D. W. Cornell and P. S. Yu. An Effective Approach to Vertical Partitioning for Physical Design of Relational Databases. IEEE Trans. Softw. Eng., 16(2):248-258, 1990.
[6]
M. Grund, J. Krüger, H. Plattner, A. Zeier, P. Cudre-Mauroux, and S. Madden. HYRISE: A Main Memory Hybrid Storage Engine. PVLDB, 4(2):105-116, 2010.
[7]
M. Hammer and B. Niamir. A Heuristic Approach to Attribute Partitioning. In ACM SIGMOD, pages 93-101, 1979.
[8]
R. A. Hankins and J. M. Patel. Data Morphing: An Adaptive, Cache-Conscious Storage Technique. In VLDB, pages 417-428, 2003.
[9]
HBase, hbase.apache.org.
[10]
J. A. Hoffer and D. G. Severance. The Use of Cluster Analysis in Physical Data Base Design. In VLDB, pages 69-86, 1975.
[11]
A. Jindal and J. Dittrich. Relax and let the database do the partitioning online. In BIRTE, pages 65-80, 2011.
[12]
A. Jindal, J.-A. Quiané-Ruiz, and J. Dittrich. Trojan Data Layouts: Right Shoes for a Running Elephant. In ACM SOCC, pages 21:1-21:14, 2011.
[13]
A. Jindal, F. M. Schuhknecht, J. Dittrich, K. Khachatryan, and A. Bunte. How Achaeans Would Construct Columns in Troy. In CIDR, 2013.
[14]
W. T. M. Jr., P. J. Schweitzer, and T. W. White. Problem Decomposition and Data Reorganization by a Clustering Technique. Operations Research, 20(5):993-1009, 1972.
[15]
S. Navathe, S. Ceri, G. Wiederhold, and J. Dou. Vertical Partitioning Algorithms for Database Design. ACM TODS, 9(4):680-710, 1984.
[16]
S. B. Navathe and M. Ra. Vertical Partitioning for Database Design: A Graphical Algorithm. In ACM SIGMOD, pages 440-450, 1989.
[17]
S. Papadomanolakis and A. Ailamaki. AutoPart: Automating Schema Design for Large Scientific Databases Using Data Partitioning. In SSDBM, pages 383-392, 2004.
[18]
V. Raman, G. Swart, L. Qiao, F. Reiss, V. Dialani, D. Kossmann, I. Narang, and R. Sidle. Constant-Time Query Processing. In ICDE, pages 60-69, 2008.
[19]
Star Schema Benchmark, www.cs.umb.edu/~poneil/StarSchemaB.PDF.
[20]
Vertica, vertica.com.

Cited By

View all
  • (2024)Enhancing Storage Efficiency and Performance: A Survey of Data Partitioning TechniquesJournal of Computer Science and Technology10.1007/s11390-024-3538-139:2(346-368)Online publication date: 1-Mar-2024
  • (2021)Jigsaw: A Data Storage and Query Processing Engine for Irregular Table PartitioningProceedings of the 2021 International Conference on Management of Data10.1145/3448016.3457547(898-911)Online publication date: 9-Jun-2021
  • (2018)GridFormationProceedings of the First International Workshop on Exploiting Artificial Intelligence Techniques for Data Management10.1145/3211954.3211956(1-7)Online publication date: 10-Jun-2018
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 6, Issue 6
April 2013
144 pages

Publisher

VLDB Endowment

Publication History

Published: 01 April 2013
Published in PVLDB Volume 6, Issue 6

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 15 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Enhancing Storage Efficiency and Performance: A Survey of Data Partitioning TechniquesJournal of Computer Science and Technology10.1007/s11390-024-3538-139:2(346-368)Online publication date: 1-Mar-2024
  • (2021)Jigsaw: A Data Storage and Query Processing Engine for Irregular Table PartitioningProceedings of the 2021 International Conference on Management of Data10.1145/3448016.3457547(898-911)Online publication date: 9-Jun-2021
  • (2018)GridFormationProceedings of the First International Workshop on Exploiting Artificial Intelligence Techniques for Data Management10.1145/3211954.3211956(1-7)Online publication date: 10-Jun-2018
  • (2017)Scalable asynchronous gradient descent optimization for out-of-core modelsProceedings of the VLDB Endowment10.14778/3115404.311540510:10(986-997)Online publication date: 1-Jun-2017
  • (2017)Wide Table Layout Optimization based on Column Ordering and DuplicationProceedings of the 2017 ACM International Conference on Management of Data10.1145/3035918.3035930(299-314)Online publication date: 9-May-2017
  • (2016)Skipping-oriented partitioning for columnar layoutsProceedings of the VLDB Endowment10.14778/3025111.302512310:4(421-432)Online publication date: 1-Nov-2016
  • (2016)Operational analytics data management systemsProceedings of the VLDB Endowment10.14778/3007263.30073199:13(1601-1604)Online publication date: 1-Sep-2016
  • (2015)Vertical partitioning for query processing over raw dataProceedings of the 27th International Conference on Scientific and Statistical Database Management10.1145/2791347.2791369(1-12)Online publication date: 29-Jun-2015
  • (2014)Detecting correlated columns in relational databases with mixed data typesProceedings of the 26th International Conference on Scientific and Statistical Database Management10.1145/2618243.2618251(1-12)Online publication date: 30-Jun-2014
  • (2013)The uncracked pieces in database crackingProceedings of the VLDB Endowment10.14778/2732228.27322297:2(97-108)Online publication date: 1-Oct-2013

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media