skip to main content
10.1145/3474717.3484265acmconferencesArticle/Chapter ViewAbstractPublication PagesgisConference Proceedingsconference-collections
research-article

STAR: A Cache-based Distributed Warehouse System for Spatial Data Streams

Published: 04 November 2021 Publication History

Abstract

The proliferation of mobile phones and location-based services has given rise to an explosive growth in spatial data. In order to enable spatial data analytics, spatial data needs to be streamed into a data stream warehouse system that can provide real-time analytical results over the most recent and historical spatial data in the warehouse. Existing data stream warehouse systems are not tailored for spatial data. In this paper, we introduce the STAR (Spatial Data Stream Warehouse) system. STAR is a distributed in-memory data stream warehouse system that provides low-latency and up-to-date analytical results over a fast-arriving spatial data stream. STAR supports queries that are composed of aggregate functions and ad hoc query constraints over spatial, textual, and temporal data attributes. STAR implements a cache-based mechanism to facilitate the processing of queries that collectively utilizes the techniques of query-based caching (i.e., view materialization) and object-based caching. Extensive experiments over real data sets demonstrate the superior performance of STAR over existing systems.

References

[1]
[n.d.]. Apache Kafka. https://kafka.apache.org. Accessed: 2021-01-05.
[2]
[n.d.]. Apache Storm. http://storm.apache.org/. Accessed: 2021-01-05.
[3]
Pankaj K Agarwal, Graham Cormode, Zengfeng Huang, Jeff M Phillips, Zhewei Wei, and Ke Yi. 2013. Mergeable summaries. TODS (2013).
[4]
Ablimit Aji, Fusheng Wang, Hoang Vo, Rubao Lee, Qiaoling Liu, Xiaodong Zhang, and Joel Saltz. 2013. Hadoop GIS: A High Performance Spatial Data Warehousing System over Mapreduce. PVLDB (2013), 1009--1020.
[5]
Ahmed M Aly, Ahmed R Mahmood, Mohamed S Hassan, Walid G Aref, Mourad Ouzzani, Hazem Elmeleegy, and Thamir Qadah. 2015. AQWA: adaptive query workload aware partitioning of big spatial data. PVLDB 8 (2015), 2062--2073.
[6]
Magdalena Balazinska, YongChul Kwon, Nathan Kuchta, and Dennis Lee. 2007. Moirae: History-Enhanced Monitoring. In CIDR. 375--386.
[7]
David M Blei, Andrew Y Ng, and Michael I Jordan. 2003. Latent dirichlet allocation. Journal of machine Learning research 3, Jan (2003), 993--1022.
[8]
Lisi Chen, Gao Cong, and Xin Cao. 2013. An Efficient Query Indexing Mechanism for Filtering Geo-textual Data. In SIGMOD. 749--760.
[9]
Yue Chen, Zhida Chen, Gao Cong, Ahmed R Mahmood, and Walid G Aref. 2020. SSTD: A Distributed System on Streaming Spatio-Textual Data. PVLDB 13 (2020).
[10]
Zhida Chen, Gao Cong, and Walid G Aref. 2020. STAR: A Distributed Stream Warehouse System for Spatial Data. In SIGMOD. 2761--2764.
[11]
Zhida Chen, Gao Cong, Zhenjie Zhang, Tom Z.J. Fu, and Lisi Chen. 2017. Distributed Publish/Subscribe Query Processing on the Spatio-Textual Data Stream. In ICDE. 1095--1106.
[12]
Anna Ciampi, Annalisa Appice, Donato Malerba, and Angelo Muolo. 2011. Spacetime roll-up and drill-down into geo-trend stream cubes. Foundations of Intelligent Systems (2011), 365--375.
[13]
Roozbeh Derakhshan, Bela Stantic, Othmar Korn, and Frank Dehne. 2008. Parallel simulated annealing for materialized view selection in data warehousing environments. Lecture Notes in Computer Science 5022 (2008), 121--132.
[14]
Ahmed Eldawy and Mohamed F Mokbel. 2015. SpatialHadoop: A MapReduce framework for spatial data. In ICDE. 1352--1363.
[15]
Wei Feng, Chao Zhang, Wei Zhang, Jiawei Han, Jianyong Wang, Charu Aggarwal, and Jianbin Huang. 2015. STREAMCUBE: Hierarchical spatio-temporal hashtag clustering for event exploration over the Twitter stream. In ICDE. 1561--1572.
[16]
Thanaa M Ghanem, Ahmed K Elmagarmid, Per-Ake Larson, and Walid G Aref. 2010. Supporting views in data stream management systems. TODS 35 (2010).
[17]
Lukasz Golab, Theodore Johnson, J. Spencer Seidel, and Vladislav Shkapenyuk. 2009. Stream Warehousing with DataDepot. In SIGMOD. 847--854.
[18]
Lukasz Golab, Theodore Johnson, Subhabrata Sen, and Jennifer Yates. 2012. A Sequence-Oriented Stream Warehouse Paradigm for Network Monitoring Applications. In PAM. Springer, 53--63.
[19]
Marcin Gorawski and Rafal Malczok. 2010. Indexing Spatial Objects in Stream Data Warehouse. Advances in Intelligent Information and Database Systems 283 (2010), 53--65.
[20]
Himanshu Gupta and Inderpal Singh Mumick. 1999. Selection of Views to Materialize Under a Maintenance Cost Constraint. In ICDT. 453--470.
[21]
Jiawei Han, Yixin Chen, Guozhu Dong, Jian Pei, Benjamin W. Wah, Jianyong Wang, and Y. Dora Cai. 2005. Stream Cube: An Architecture for Multi-Dimensional Analysis of Data Streams. Distributed and Parallel Databases 18, 2 (2005), 173--197.
[22]
Venky Harinarayan, Anand Rajaraman, and Jeffrey D. Ullman. 1996. Implementing Data Cubes Efficiently. In SIGMOD. 205--216.
[23]
J.-T. Horng, Y.-J. Chang, and B.-J. Liu. 2003. Applying evolutionary algorithms to materialized view selection in a data warehouse. Soft Computing 7 (2003), 574--581.
[24]
Huiqi Hu, Yiqun Liu, Guoliang Li, Jianhua Feng, and Kian-Lee Tan. 2015. A location-aware publish/subscribe framework for parameterized spatio-textual subscriptions. In ICDE. 711--722.
[25]
Wilburt Juan Labio, Dallan Quass, and Brad Adelberg. 1997. Physical database design for data warehouses. In ICDE. 277--288.
[26]
Wang Lam, Lu Liu, Sts Prasad, Anand Rajaraman, Zoheb Vacheri, and AnHai Doan. 2012. Muppet: MapReduce-style Processing of Fast Data. PVLDB 5 (2012), 1814--1825.
[27]
Scott T Leutenegger, Mario A Lopez, and Jeffrey Edgington. 1997. STR: A simple and efficient algorithm for R-tree packing. In ICDE. 497--506.
[28]
Lauro Lins, James T Klosowski, and Carlos Scheidegger. 2013. Nanocubes for Real-Time Exploration of Spatiotemporal Datasets. TVCG 19 (2013), 2456--2465.
[29]
Ahmed R. Mahmood, Ahmed M. Aly, Thamir Qadah, El Kindi Rezig, Anas Daghistani, Amgad Madkour, Ahmed S. Abdelhamid, Mohamed S. Hassan, Walid G. Aref, and Saleh Basalamah. 2015. Tornado: A Distributed Spatio-textual Stream Processing System. PVLDB 8 (2015), 2020--2023.
[30]
Ahmed R Mahmood, Anas Daghistani, Ahmed M Aly, Mingjie Tang, Saleh Basalamah, Sunil Prabhakar, and Walid G Aref. 2018. Adaptive processing of spatial-keyword data over a distributed streaming cluster. In SIGSPATIAL. 219--228.
[31]
Imene Mami and Zohra Bellahsene. 2012. A Survey of View Selection Methods. In SIGMOD. 20--29.
[32]
Ahmed Metwally, Divyakant Agrawal, and Amr El Abbadi. 2005. Efficient computation of frequent and top-k elements in data streams. In ICDT. 398--412.
[33]
Christopher Olston, Greg Chiou, Laukik Chitnis, Francis Liu, Yiping Han, Mattias Larsson, Andreas Neumann, Vellanki B.N. Rao, Vijayanand Sankarasubramanian, Siddharth Seth, Chao Tian, Topher ZiCornell, and Xiaodan Wang. 2011. Nova: Continuous Pig/Hadoop Workflows. In SIGMOD. 1081--1090.
[34]
Kenneth A. Ross, Divesh Srivastava, and S. Sudarshan. 1996. Materialized View Maintenance and Integrity Constraint Checking: Trading Space for Time. In SIGMOD. 447--458.
[35]
Hanan Samet. 2006. Foundations of multidimensional and metric data structures. Academic Press.
[36]
Anders Skovsgaard, Darius Sidlauskas, and Christian S Jensen. 2014. Scalable top-k spatio-temporal term querying. In ICDE. 148--159.
[37]
Mingjie Tang, Ahmed M Aly, Ahmed R Mahmood, Thamir Qadah, Walid G Aref, and Saleh Basalamah. 2016. Cruncher: Distributed in-memory processing for location-based services. In ICDE. 1406--1409.
[38]
Mingjie Tang, Yongyang Yu, Qutaibah M Malluhi, Mourad Ouzzani, and Walid G Aref. 2016. Locationspark: A distributed in-memory data management system for big spatial data. PVLDB 9 (2016), 1565--1568.
[39]
MingJie Tang, Yongyang Yu, Qutaibah M. Malluhi, Mourad Ouzzani, and Walid G. Aref. 2016. LocationSpark: A Distributed In-Memory Data Management System for Big Spatial Data. PVLDB 9 (2016), 1565--1568.
[40]
Bin Wang, Rui Zhu, Xiaochun Yang, and Guoren Wang. 2017. Top-k representative documents query over geo-textual data stream. WWW (2017), 1--19.
[41]
Xiang Wang, Ying Zhang, Wenjie Zhang, Xuemin Lin, and Zengfeng Huang. 2016. Skype: Top-k Spatial-keyword Publish/Subscribe over Sliding Window. PVLDB 9 (2016), 588--599.
[42]
Dong Xie, Feifei Li, Bin Yao, Gefei Li, Liang Zhou, and Minyi Guo. 2016. Simba: Efficient In-Memory Spatial Analytics. In SIGMOD. 1071--1085.
[43]
Xiaopeng Xiong, Hicham G Elmongui, Xiaoyong Chai, and Walid G Aref. 2007. Place: A distributed spatio-temporal data stream management system for moving objects. In MDM. 44--51.
[44]
Jian Yang, Kamalakar Karlapalem, and Qing Li. 1997. Algorithms for materialized view design in data warehousing environment. In VLDB, Vol. 97. 136--145.
[45]
S. You, J. Zhang, and L. Gruenwald. 2015. Large-scale spatial join query processing in Cloud. In ICDEW. 34--41.
[46]
Jia Yu, Jinxuan Wu, and Mohamed Sarwat. 2015. GeoSpark: A Cluster Computing Framework for Processing Large-scale Spatial Data. In SIGSPATIAL.

Cited By

View all
  • (2022)Optimizing the Performance of Data Warehouse by Query Cache MechanismIEEE Access10.1109/ACCESS.2022.314813110(13472-13480)Online publication date: 2022

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGSPATIAL '21: Proceedings of the 29th International Conference on Advances in Geographic Information Systems
November 2021
700 pages
ISBN:9781450386647
DOI:10.1145/3474717
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 November 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. data stream
  2. distributed system
  3. spatial data
  4. warehouse system

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

SIGSPATIAL '21
Sponsor:

Acceptance Rates

Overall Acceptance Rate 220 of 1,116 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)22
  • Downloads (Last 6 weeks)0
Reflects downloads up to 15 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2022)Optimizing the Performance of Data Warehouse by Query Cache MechanismIEEE Access10.1109/ACCESS.2022.314813110(13472-13480)Online publication date: 2022

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media