skip to main content
research-article

Summarizing static and dynamic big graphs

Published: 01 August 2017 Publication History

Abstract

Large-scale, highly-interconnected networks pervade our society and the natural world around us, including the World Wide Web, social networks, knowledge graphs, genome and scientific databases, medical and government records. The massive scale of graph data often surpasses the available computation and storage resources. Besides, users get overwhelmed by the daunting task of understanding and using such graphs due to their sheer volume and complexity. Hence, there is a critical need to summarize large graphs into concise forms that can be more easily visualized, processed, and managed. Graph summarization has indeed attracted a lot of interests from various research communities, such as sociology, physics, chemistry, bioinformatics, and computer science. Different ways of summarizing graphs have been invented that are often complementary to each other. In this tutorial, we discuss algorithmic advances on graph summarization in the context of both classical (e.g., static graphs) and emerging (e.g., dynamic and stream graphs) applications. We emphasize the current challenges and highlight some future research directions.

References

[1]
K. J. Ahn, S. Guha, and A. McGregor, Analyzing Graph Structure via Linear Measurements, SODA, 2012.
[2]
K. J. Ahn, S. Guha, and A. McGregor, Graph Sketches: Sparsification, Spanners, and Subgraphs, PODS, 2012.
[3]
S. Bandyopadhyay, M. Mehta, D. Kuo, M.-K. Sung, R. Chuang, E. J. Jaehnig, B. Bodenmiller, K. Licon, W. Copeland, M. Shales, D. Fiedler, J. Dutkowski, A. Guénolé, H. van Attikum, K. M. Shokat, R. D. Kolodner, W.-K. Huh, R. Aebersold, M.-C. Keogh, N. J. Krogan, and T. Ideker, Rewiring of Genetic Networks in Response to DNA Damage., Science, 330(6009), 1385--9.
[4]
P. Boldi, M. Rosa, M. Santini, and S. Vigna, Layered Label Propagation: a Multiresolution Coordinate-free Ordering for Compressing Social Networks, WWW, 2011.
[5]
P. Boldi and S. Vigna, The Webgraph Framework I: Compression Techniques, WWW, 2004.
[6]
N. R. Brisaboa and S. Ladra and G. Navarro, k2-Trees for Compact Web Graph Representation, SPIRE, 2009.
[7]
G. Buehrer and K. Chellapilla, A Scalable Pattern Mining Approach to Web Graph Compression with Communities, WSDM, 2008.
[8]
C. Chen, C. X. Lin, M. Fredrikson, M. Christodorescu, X. Yan, and J. Han, Mining Graph Patterns Efficiently via Randomized Summaries, VLDB, 2009.
[9]
C. Chen, C. X. Lin, M. Fredrikson, M. Christodorescu, X. Yan, and J. Han, Mining Large Information Networks by Graph Summarization. Link Mining: Models, Algorithms, and Applications. (pp. 475--501), Springer New York, 2010.
[10]
C. Chen, X. Yan, F. Zhu, J. Han, and P. S. Yu, Graph OLAP: Towards Online Analytical Processing on Graphs, ICDM, 2008.
[11]
C. Chen, F. Zhu, X. Yan, J. Han, P. S. Yu, and R. Ramakrishnan, InfoNetOLAP: OLAP and Mining of Information Networks. Link Mining: Models, Algorithms, and Applications. (pp. 411--438), Springer New York, 2010.
[12]
F. Chierichetti, R. Kumar, S. Lattanzi, M. Mitzenmacher, A. Panconesi, and P. Raghavan, On Compressing Social Networks, KDD, 2009.
[13]
Y. Choi and W. Szpankowski, Compression of Graphical Structures: Fundamental Limits, Algorithms, and Experiments, Information Theory, IEEE Transactions on, 58(2):620--638, 2012.
[14]
D. J. Cook and L. B. Holder, Substructure Discovery Using Minimum Description Length and Background Knowledge, J. Artif. Int. Res., 1(1), 231--255, 1994.
[15]
G. Cormode and S. Muthukrishnan, Space Efficient Mining of Multigraph Streams, PODS, 2005.
[16]
W. Fan, J. Li, X. Wang, and Y. Wu, Query Preserving Graph Compression, SIGMOD, 2012.
[17]
J. Feigenbaum, S. Kannan, A. McGregor, S. Suri, and J. Zhang, Graph Distances in the Data Stream Model, SIAM J. Computing, 38(5), 2005.
[18]
W. Han, Y. Miao, K. Li, M. Wu, F. Yang, L. Zhou, V. Prabhakaran, W. Chen, and E. Chen, Chronos: A Graph Engine for Temporal Graph Analysis, EuroSys, 2014.
[19]
U. Kang and C. Faloutsos, Beyond 'Caveman Communities': Hubs and Spokes for Graph Compression and Mining, ICDM, 2011.
[20]
U. Kang, H. Tong, J. Sun, C. Y. Lin, and C. Faloutsos, GBASE: A Scalable and General Graph Management System, KDD, 2011.
[21]
A. Khan and C. Aggarwal, Query-Friendly Compression of Graph Streams, ASONAM, 2016.
[22]
A. Khan and C. Aggarwal, Toward Query-Friendly Compression of Rapid Graph Streams, Social Network Analysis and Mining, Springer, 2017.
[23]
K. U. Khan, W. Nawaz, and Y.-K. Lee, Set-based Unified Approach for Summarization of a Multi-Attributed Graph, WWW, 2016.
[24]
D. Koutra, Summarizing Large-Scale Graph Data, SDM, 2017.
[25]
D. Koutra, U. Kang, J. Vreeken, and C. Faloutsos, Summarizing and Understanding Large Graphs, Stat. Anal. Data Min., 8(3), 183--202, 2015.
[26]
Y. Liu, A. Dighe, T. Safavi, and D. Koutra, A Graph Summarization: A Survey, CoRR, abs/1612.04883, 2016.
[27]
K. LeFevre and E. Terzi, GraSS: Graph Structure Summarization, SDM, 2010.
[28]
C.-T. Li and S.-D. Lin, Egocentric Information Abstraction for Heterogeneous Social Networks, ASONAM, 2009.
[29]
S.-D. Lin, M.-Y. Yeh, and C.-T. Li, Sampling and Summarization for Social Networks, SDM, 2013.
[30]
X. Liu, Y. Tian, Q. He, W.-C. Lee, and J. McPherson, Distributed Graph Summarization, CIKM, 2014.
[31]
A. McGregor, Graph Stream Algorithms: A Survey, SIGMOD Rec., 43(1), 2014.
[32]
H. Maserrat and J. Pei, Neighbor Query Friendly Compression of Social Networks, KDD, 2010.
[33]
H. Maserrat and J. Pei, Community Preserving Lossy Compression of Social Networks, ICDM, 2012.
[34]
S. Navlakha, R. Rastogi, and N. Shrivastava, Graph Summarization with Bounded Error, SIGMOD, 2008.
[35]
M. Purohit, B. A. Prakash, C. Kang, Y. Zhang, and V.S. Subrahmanian, Fast Influence-Based Coarsening For Large Networks, KDD, 2014.
[36]
Q. Qu, S. Liu, C. S. Jensen, F. Zhu, and C. Faloutsos, Interestingness-Driven Diffusion Process Summarization in Dynamic Networks, ECML PKDD, 2014.
[37]
S. Raghavan and H. Garcia-Molina, Representing Web Graphs, ICDE, 2003.
[38]
M. Riondato and D. García-Soriano and F. Bonchi, Graph Summarization with Quality Guarantees, Data Min. Knowl. Discov., 31(2), 314--349, 2017.
[39]
O. Rottenstreich, Y. Kanizo, and I. Keslassy, The Variable-Increment Counting Bloom Filter, IEEE/ACM Trans. Netw., 22(4):10921105, 2014.
[40]
B.-S. Seah, S. S. Bhowmick, C. F. Dewey Jr, and H. Yu, FUSE: A Profit Maximization Approach for Functional Summarization of Biological Networks, BMC Bioinformatics, 13(Suppl 3):S10, 2012.
[41]
B.-S. Seah, S. S. Bhowmick, and C. F. Dewey Jr, DiffNet: Automatic Differential Functional Summarization of dE-MAP Networks., Methods, 69(3), 2014.
[42]
N. Shah, D. Koutra, T. Zou, B. Gallagher, and C. Faloutsos, TimeCrunch: Interpretable Dynamic Graph Summarization, KDD, 2015.
[43]
Z. Shen, K. L. Ma, and T. Eliassi-Rad, Visual Analysis of Large Heterogeneous Social Networks by Semantic and Structural Abstraction, IEEE Transactions on Visualization and Computer Graphics, 12(6), 14271439, 2006.
[44]
L. Shi, S. Sun, Y. Xuan, Y. Su, H. Tong, S. Ma, and Y. Chen, TOPIC: TOward Perfect InfluenCe Graph Summarization, ICDE, 2016.
[45]
N. Tang, Q. Chen, and P. Mitra, Graph Stream Summarization: From Big Bang to Big Crunch, SIGMOD, 2016.
[46]
I. Tsalouchidou, G. D. F. Morales, F. Bonchi, and R. A. Baeza-Yates, Scalable Dynamic Graph Summarization, IEEE Big Data, 2016.
[47]
Y. Tian, R. A. Hankins, and J. M. Patel, Efficient Aggregation for Graph Summarization, SIGMOD, 2008.
[48]
Y. Tian and J. M. Patel, Interactive Graph Summarization. Link Mining: Models, Algorithms, and Applications. (pp. 389--410), Springer New York, 2010.
[49]
H. Toivonen, F. Zhou, A. Hartikainen, and A. Hinkka, Compression of Weighted Graphs, KDD, 2011.
[50]
D. Wang, P. Cui, and W. Zhu, Structural Deep Network Embedding, KDD, 2016.
[51]
Y. Wu, S. Yang, M. Srivatsa, A. Iyengar, and X. Yan, Summarizing Answer Graphs Induced by Keyword Queries, VLDB, 2013.
[52]
N. Zhang, Y. Tian, and J. M. Patel, Discovery-driven Graph Summarization, ICDE, 2010.
[53]
J. Zhang, S. S. Bhowmick, H. H. Nguyen, B. Choi, F. Zhu. DAVINCI: Data-driven Visual Interface Construction for Subgraph Search in Graph Databases, ICDE, 2015.
[54]
P. Zhao, C. Aggarwal, and M. Wang, gSketch: On Query Estimation in Graph Streams, VLDB, 2012.
[55]
P. Zhao, X. Li, D. Xin, and J. Han, Graph Cube: On Warehousing and OLAP Multidimensional Networks, SIGMOD, 2011.
[56]
F. Zhou, S. Malher, and H. Toivonen, Network Simplification with Minimal Loss of Connectivity, ICDM, 2010.

Cited By

View all
  • (2022)Multi-relation Graph SummarizationACM Transactions on Knowledge Discovery from Data10.1145/349456116:5(1-30)Online publication date: 9-Mar-2022
  • (2021)ExpanDrogram: Dynamic Visualization of Big Data Segmentation over TimeJournal of Data and Information Quality10.1145/343477813:2(1-27)Online publication date: 2-Jun-2021
  • (2021)GS4: Graph stream summarization based on both the structure and semanticsThe Journal of Supercomputing10.1007/s11227-020-03290-277:3(2713-2733)Online publication date: 1-Mar-2021
  • Show More Cited By

Index Terms

  1. Summarizing static and dynamic big graphs
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Proceedings of the VLDB Endowment
    Proceedings of the VLDB Endowment  Volume 10, Issue 12
    August 2017
    427 pages
    ISSN:2150-8097
    Issue’s Table of Contents

    Publisher

    VLDB Endowment

    Publication History

    Published: 01 August 2017
    Published in PVLDB Volume 10, Issue 12

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)26
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 15 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)Multi-relation Graph SummarizationACM Transactions on Knowledge Discovery from Data10.1145/349456116:5(1-30)Online publication date: 9-Mar-2022
    • (2021)ExpanDrogram: Dynamic Visualization of Big Data Segmentation over TimeJournal of Data and Information Quality10.1145/343477813:2(1-27)Online publication date: 2-Jun-2021
    • (2021)GS4: Graph stream summarization based on both the structure and semanticsThe Journal of Supercomputing10.1007/s11227-020-03290-277:3(2713-2733)Online publication date: 1-Mar-2021
    • (2019)Summarizing semantic graphsThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-018-0528-328:3(295-327)Online publication date: 1-Jun-2019
    • (2018)CIKM 2018 Co-Located Workshops SummaryProceedings of the 27th ACM International Conference on Information and Knowledge Management10.1145/3269206.3274267(2309-2311)Online publication date: 17-Oct-2018
    • (2018)Log(graph)Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques10.1145/3243176.3243198(1-13)Online publication date: 1-Nov-2018
    • (2018)Efficiently summarizing attributed diffusion networksData Mining and Knowledge Discovery10.1007/s10618-018-0572-z32:5(1251-1274)Online publication date: 1-Sep-2018

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media