skip to main content
research-article

Efficient computation of deletion-robust k-coverage queries

Published: 01 March 2021 Publication History

Abstract

Extracting a controllable subset from a large-scale dataset so that users can fully understand the entire dataset is a significant topic for multicriteria decision making. In recent years, this problem has been widely studied, and various query models have been proposed, such as top-k, skyline, k-regret and k-coverage queries. Among these models, the k-coverage query is an ideal query method; this model has stability, scale invariance and high traversal efficiency. However, current methods including k-coverage queries cannot deal with deleting some points from the dataset while providing an effective solution set efficiently. In this paper, we study the robustness of k-coverage queries in two cases involving the dynamic deletion of data points. The first case is when it is assumed that the whole dataset can be obtained in advance, while the second is when the data points arrive in a stream. For a centralized dataset, we introduce a sieving mechanism and use a precalculated threshold to filter a coreset from the entire dataset. Then, the k-coverage query can be carried out on this small coreset instead of the entire dataset, and we propose a threshold-based k-coverage query algorithm, which greatly accelerates query processing. For a streaming dataset, a special chain structure is adopted. Furthermore, a single-pass streaming algorithm named Robust-Sieving is proposed. Moreover, the coreset-based method is extended to answer the problem. In addition, sampling techniques are adopted to accelerate query processing under these two circumstances. Extensive experiments verify the effectiveness of our proposed Robust-Sieving algorithm and the coreset-based algorithms with or without sampling.

References

[1]
Allenby R B and Slomson A How to count: an introduction to combinatorics 2010 2 Boca Raton Chapman & Hall/CRC
[2]
Badanidiyuru A, Mirzasoleiman B, Karbasi A, Krause A (2014) Streaming submodular maximization: Massive data summarization on the fly. In: Proceedings of the international conference on knowledge discovery and data mining (SIGKDD), pp 671–680
[3]
Bai M, Xin J, Wang G, Zhang L, Zimmermann R, Yuan Y, and Wu X Discovering the k representative skyline over a sliding window IEEE Trans Knowl Data Eng (TKDE) 2016 28 8 2041-2056
[4]
Börzsöny S, Kossmann D, Stocker K (2001) The skyline operator. In: Proceedings of the IEEE international conference on data engineering (ICDE), pp 421–430
[5]
Boykov YY, Jolly M (2001) Interactive graph cuts for optimal boundary & region segmentation of objects in n-d images. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 105–112
[6]
Chan C-Y, Jagadish HV, Tan K-L, Tung AKH, Zhang Z (2006) Finding k-dominant skylines in high dimensional space. In: Proceedings of the international conference on management of data (SIGMOD), pp 503–514
[7]
Chan C-Y, Jagadish HV, Tan K-L, Tung AKH, Zhang Z (2006) On high dimensional skylines. In: Proceedings of the international conference on extending database technology (EDBT), pp 478–495
[8]
Chester S, Thomo A, Venkatesh S, Whitesides S (2014) Computing k-regret minimizing sets. In: Proceedings of the international conference on very large data bases (VLDB), pp 389–400
[9]
Faulkner TK, Brackenbury W, Lall A (2015) k-regret queries with nonlinear utilities. In: Proceedings of the international conference on very large data bases (VLDB), pp 2098–2109
[10]
Feldman M, Karbasi A, Kazemi E (2018) Do less, get more: streaming submodular maximization with subsampling. In: Proceedings of advances in neural information processing systems (NIPS), pp 732–742
[11]
Gomes R, Krause A (2010) Budgeted nonparametric learning from data streams. In: Proceedings of the international conference on international conference on machine learning (ICML), pp 391–398
[12]
Huang X, Zheng J (2019) Deletion-robust k-coverage queries. In: Proceedings of international conference on database systems for advanced applications (DASFAA), pp 215–219
[13]
Huang X, Zheng J (2019) Streaming deletion-robust k-coverage queries. In: Proceedings of the international symposium on spatial and temporal databases (SSTD), pp 170–173
[14]
Ilyas IF, Beskales G, and Soliman MA A survey of top-k query processing techniques in relational database systems ACM Comput Surv (CSUR) 2008 40 4 1-58
[15]
Kazemi E, Zadimoghaddam M, Karbasi A (2018) Scalable deletion-robust submodular maximization: data summarization with privacy and fairness constraints. In: Proceedings of the international conference on machine learning (ICML), pp 2549–2558
[16]
Krause A, Singh A, and Guestrin C Near-optimal sensor placements in gaussian processes: theory, efficient algorithms and empirical studies J Mach Learn Res (JMLR) 2008 9 235-284
[17]
Lee J, You G won, and Hwang S won Personalized top-k skyline queries in high-dimensional space Inf Syst 2009 34 1 45-61
[18]
Lian X, Chen L (2009) Top-k dominating queries in uncertain databases. In: Proceedings of the international conference on extending database technology (EDBT), pp 660–671
[19]
Lin H, Bilmes J (2011) A class of submodular functions for document summarization. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies (HLT), pp 510–520
[20]
Lin X, Yuan Y, Wei W, Lu H (2005) Stabbing the sky: efficient skyline computation over sliding windows. In: Proceedings of the IEEE international conference on data engineering (ICDE), pp 502–513
[21]
Lin X, Yuan Y, Zhang Q, Zhang Y (2007) Selecting stars: the k most representative skyline operator. In: Proceedings of the IEEE international conference on data engineering (ICDE), pp 86–95
[22]
Lu H, Jensen CS, and Zhang Z Flexible and efficient resolution of skyline query size constraints IEEE Trans Knowl Data Eng (TKDE) 2011 23 7 991-1005
[23]
Magnani M, Assent I, and Mortensen ML Taking the big picture: representative skylines based on significance and diversity VLDB J 2014 23 5 795-815
[24]
Mindolin D, Chomicki J (2009) Discovering relative importance of skyline attributes. In: Proceedings of the international conference on very large data bases (VLDB), pp 610–621
[25]
Mirzasoleiman B, Badanidiyuru A, Karbasi A, Vondrak J, Krause A (2015) Lazier than lazy greedy. In: Proceedings of the AAAI conference on artificial intelligence (AAAI), pp 1812–1818
[26]
Mirzasoleiman B, Karbasi A, Krause A (2017) Deletion-robust submodular maximization: Data summarization with “the right to be forgotten”. In: Proceedings of the international conference on machine learning (ICML), pp 2449–2458
[27]
Nanongkai D, Lall A, Sarma AD, Makino K (2012) Interactive regret minimization. In: Proceedings of the international conference on management of data (SIGMOD), pp 109–120
[28]
Nanongkai D, Sarma AD, Lall A, Lipton RJ, Xu J (2010) Regret-minimizing representative databases. In: Proceedings of the international conference on very large data bases (VLDB), pp 1114–1124
[29]
Nemhauser GL, Wolsey LA, and Fisher ML An analysis of approximations for maximizing submodular set functions–i Math Program 1978 14 1 265-294
[30]
Orlin JB, Schulz AS, Udwani R (2016) Robust monotone submodular function maximization. In: Louveaux Q, Skutella M (eds) Integer programming and combinatorial optimization, pp 312–324
[31]
Papadias D, Tao Y, Fu G, Seeger B (2003) An optimal and progressive algorithm for skyline queries. In: Proceedings of the international conference on management of data (SIGMOD), pp 467–478
[32]
Papadias D, Tao Y, Fu G, and Seeger B Progressive skyline computation in database systems TODS 2005 30 1 41-82
[33]
Peng P, Wong RC-W (2014) Geometry approach for k-regret query. In: Proceedings of the IEEE international conference on data engineering (ICDE), pp 772–783
[34]
Qi J, Zuo F, Samet H, and Yao JC K-regret queries using multiplicative utility functions TODS 2018 43 2 10:1-10:41
[35]
Søholm M, Chester S, Assent I (2016) Maximum coverage representative skyline. In: Proceedings of the international conference on extending database technology (EDBT), pp 702–703
[36]
Soliman MA, Ilyas IF, Chang KC (2007) Top-k query processing in uncertain databases. In: Proceedings of the IEEE international conference on data engineering (ICDE), pp 896–905
[37]
Tao Y, Ding L, Lin X, Pei J (2009) Distance-based representative skyline. In: Proceedings of the IEEE international conference on data engineering (ICDE), pp 892–903
[38]
Wang S, Cheema MA, Zhang Y, Lin X (2015) Selecting representative objects considering coverage and diversity. In: Proceedings of the international ACM workshop on managing and mining enriched geo-spatial data (GeoRich), pp 31–38
[39]
Xie M, Wong RC-W, Lall A (2019) Strongly truthful interactive regret minimization. In: Proceedings of the international conference on management of data (SIGMOD), pp 281–298
[40]
Xie M, Wong RC-W, and Lall A An experimental survey of regret minimization query and variants: bridging the best worlds between top-k query and skyline query VLDB J 2020 29 147-175
[41]
Xie M, Wong RC-W, Li J, Long C, Lall A (2018) Efficient k-regret query algorithm with restriction-free bound for any dimensionality. In: Proceedings of the international conference on management of data (SIGMOD), pp 959–974
[42]
Yufei T and Dimitris P Maintaining sliding window skylines on data streams IEEE Trans Knowl Data Eng (TKDE) 2006 18 3 377-391
[43]
Zeighami S, Wong RC-W (2016) Minimizing average regret ratio in database. In: Proceedings of the international conference on management of data (SIGMOD), pp 2265–2266

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Knowledge and Information Systems
Knowledge and Information Systems  Volume 63, Issue 3
Mar 2021
221 pages

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 01 March 2021
Accepted: 30 November 2020
Revision received: 22 November 2020
Received: 29 April 2020

Author Tags

  1. k-coverage queries
  2. Representative skyline
  3. Sieving procedure
  4. Robust-Coreset
  5. Chain structure

Qualifiers

  • Research-article

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 15 Sep 2024

Other Metrics

Citations

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media