skip to main content
research-article

Nucleus Decompositions for Identifying Hierarchy of Dense Subgraphs

Published: 03 July 2017 Publication History

Abstract

Finding dense substructures in a graph is a fundamental graph mining operation, with applications in bioinformatics, social networks, and visualization to name a few. Yet most standard formulations of this problem (like clique, quasi-clique, densest at-least-k subgraph) are NP-hard. Furthermore, the goal is rarely to find the “true optimum” but to identify many (if not all) dense substructures, understand their distribution in the graph, and ideally determine relationships among them. Current dense subgraph finding algorithms usually optimize some objective and only find a few such subgraphs without providing any structural relations.
We define the nucleus decomposition of a graph, which represents the graph as a forest of nuclei. Each nucleus is a subgraph where smaller cliques are present in many larger cliques. The forest of nuclei is a hierarchy by containment, where the edge density increases as we proceed towards leaf nuclei. Sibling nuclei can have limited intersections, which enables discovering overlapping dense subgraphs. With the right parameters, the nucleus decomposition generalizes the classic notions of k-core and k-truss decompositions.
We present practical algorithms for nucleus decompositions and empirically evaluate their behavior in a variety of real graphs. The tree of nuclei consistently gives a global, hierarchical snapshot of dense substructures and outputs dense subgraphs of comparable quality with the state-of-the-art solutions that are dense and have non-trivial sizes. Our algorithms can process real-world graphs with tens of millions of edges in less than an hour. We demonstrate how proposed algorithms can be utilized on a citation network. Our analysis showed that dense units identified by our algorithms correspond to coherent articles on a specific area. Our experiments also show that we can identify dense structures that are lost within larger structures by other methods and find further finer grain structure within dense groups.

References

[1]
A. B. Adcock, B. D. Sullivan, and M. W. Mahoney. 2013. Tree-like structure in large social and information networks. In Proceedings of the IEEE International Conference on Data Mining (ICDM). 1--10.
[2]
J. Ignacio Alvarez-Hamelin, Alain Barrat, and Alessandro Vespignani. 2006. Large scale networks fingerprinting and visualization using the k-core decomposition. In Advances in Neural Information Processing Systems 18. 41--50.
[3]
R. Andersen and K. Chellapilla. 2009. Finding dense subgraphs with size bounds. In Proceedings of the Workshop on Algorithms and Models for the Web-Graph (WAW). 25--37.
[4]
A. Angel, N. Sarkas, N. Koudas, and D. Srivastava. 2012. Dense subgraph maintenance under streaming edge weight updates for real-time story identification. Proc. VLDB Endow. 5, 6 (Feb. 2012), 574--585.
[5]
Y. Asahiro, K. Iwama, H. Tamaki, and T. Tokuyama. 2000. Greedily finding a dense subgraph. J. Algor. 34, 2 (Feb. 2000), 203--221.
[6]
Oana Denisa Balalau, Francesco Bonchi, T.-H. Hubert Chan, Francesco Gullo, and Mauro Sozio. 2015. Finding subgraphs with maximum total density and limited overlap. In Proceedings of the 8th ACM International Conference on Web Search and Data Mining (WSDM’15). 379--388.
[7]
D. J. Beal, R. Cohen, M. J. Burke, and C. L. McLendon. 2003. Cohesion and performance in groups: A meta-analytic clarification of construct relation. J. Appl. Psychol. 88 (2003), 989--1004.
[8]
J. W. Berry, L. K. Fostvedt, D. J. Nordman, C. A. Phillips, C. Seshadhri, and A. G. Wilson. 2014. Why do simple algorithms for triangle enumeration work in the real world? In Proceedings of the 5th Conference on Innovations in Theoretical Computer Science (ITCS’14). ACM, New York, NY, 225--234.
[9]
Sayan Bhattacharya, Monika Henzinger, Danupon Nanongkai, and Charalampos Tsourakakis. 2015. Space- and time-efficient algorithm for maintaining dense subgraphs on one-pass dynamic streams. In Proceedings of the 47th Annual ACM on Symposium on Theory of Computing (STOC’15). 173--182.
[10]
C. Bron and J. Kerbosch. 1973. Algorithm 457: Finding all cliques of an undirected graph. Commun. ACM 16, 9 (Sep. 1973), 575--577.
[11]
G. Buehrer and K. Chellapilla. 2008. A scalable pattern mining approach to web graph compression with communities. In Proc. of the 2008 International Conference on Web Search and Data Mining (WSDM’08). 95--106.
[12]
M. Charikar. 2000. Greedy approximation algorithms for finding dense components in a graph. In Proceedings of the 3rd International Workshop on Approximation Algorithms for Combinatorial Optimization (APPROX’00). 84--95.
[13]
N. Chiba and T. Nishizeki. 1985. Arboricity and subgraph listing algorithms. SIAM J. Comput. 14, 1 (Feb. 1985), 210--223.
[14]
J. Cohen. 2008. Trusses: Cohesive subgraphs for social network analysis. National Security Agency Technical Report (2008).
[15]
J. Cohen. 2009. Graph twiddling in a mapreduce world. Comput. Sci. Eng. 11 (2009), 29--41.
[16]
UF Sparse Matrix Collection. University of Florida Sparse Matrix Collection. Retrieved March 2014 from http://www.cise.ufl.edu/research/sparse/matrices/.
[17]
P. Colomer de Simon, M. Serrano, M. G. Beiro, J. I. Alvarez-Hamelin, and M. Boguna. 2013. Deciphering the global organization of clustering in real complex networks. Sci. Rep. 3, 2517 (2013).
[18]
Y. Dourisboure, F. Geraci, and M. Pellegrini. 2007. Extraction and classification of dense communities in the web. In Proceedings of the 16th International Conference on World Wide Web (WWW’07). 461--470.
[19]
Xiaoxi Du, Ruoming Jin, Liang Ding, Victor E. Lee, and John H. Thornton, Jr. 2009. Migration motif: A spatial - temporal pattern mining approach for financial markets. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’09). ACM, New York, NY, 1135--1144.
[20]
A. Epasto, S. Lattanzi, and M. Sozio. 2015. Efficient densest subgraph computation in evolving graphs. In Proceedings of the 24th International Conference on World Wide Web (WWW’15). 300--310.
[21]
P. Erdős and A. Hajnal. 1966. On chromatic number of graphs and set-systems. Acta Math. Hung. 17 (1966), 61--99.
[22]
U. Feige. 2002. Relations between average case complexity and approximation complexity. In Proceedings of the Symposium on Theory of Computing. 534--543.
[23]
D. R. Forsyth. 2010. Group Dynamics. Cengage Learning.
[24]
A. P. Francisco and A. L. Oliveira. 2011. Fully generalized graph cores. In Complex Networks. Vol. 116. 22--34.
[25]
E. Fratkin, B. T. Naughton, D. L. Brutlag, and S. Batzoglou. 2006. MotifCut: Regulatory motifs finding with maximum density subgraphs. In ISMB (Supplement of Bioinformatics) (2006-08-28). 156--157.
[26]
G. Gallo, M. D. Grigoriadis, and R. E. Tarjan. 1989. A fast parametric maximum flow algorithm and applications. SIAM J. Comput. 18, 1 (Feb. 1989), 30--55.
[27]
D. Gibson, R. Kumar, and A. Tomkins. 2005. Discovering large dense subgraphs in massive graphs. In Proc. of the 31st International Conference on Very Large Data Bases (VLDB’05). 721--732.
[28]
A. Gionis, F. Junqueira, V. Leroy, M. Serafini, and I. Weber. 2013. Piggybacking on social networks. Proc. VLDB Endow. 6, 6 (2013), 409--420.
[29]
A. V. Goldberg. 1984. Finding a Maximum Density Subgraph. Technical Report. Berkeley, CA, USA.
[30]
R. Gupta, T. Roughgarden, and C. Seshadhri. 2014. Decompositions of triangle-dense graphs. In Innovations in Theoretical Computer Science (ITCS). 471--482.
[31]
J. Håstad. 1996. Clique is hard to approximate within n(1 − ε). In Acta Mathematica. 627--636.
[32]
H. Hu, X. Yan, Y. Huang, J. Han, and X. J. Zhou. 2005. Mining coherent dense subgraphs across massive biological networks for functional discovery. Bioinformatics 21, 1 (Jan. 2005), 213--221.
[33]
Xin Huang, Hong Cheng, Lu Qin, Wentao Tian, and Jeffrey Xu Yu. 2014. Querying k-truss community in large and dynamic graphs. In Proceedings of the ACM SIGMOD International Conf. on Management of Data. 1311--1322.
[34]
L. D. Iasemidis, D.-S. Shiau, W. Chaovalitwongse, J. C. Sackellares, P. M. Pardalos, J. C. Principe, P. R. Carney, A. Prasad, B. Veeramani, and K. Tsakalis. 2003. Adaptive epileptic seizure prediction system. IEEE. Biomed. Eng. 50 (2003), 616--627.
[35]
R. Jin, Y. Xiang, N. Ruan, and D. Fuhry. 2009. 3-HOP: A high-compression indexing scheme for reachability query. In Proceedings of the SIGMOD Conference. 813--826.
[36]
S. Khot. 2006. Ruling out PTAS for graph min-bisection, dense k-subgraph, and bipartite clique. SIAM J. Comput. 36, 4 (2006), 1025--1071.
[37]
Samir Khuller and Barna Saha. 2009. On finding dense subgraphs. In Proceedings of the International Colloquium on Automata, Languages and Programming (ICALP). 597--608.
[38]
R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. 1999. Trawling the web for emerging cyber-communities. In Proc. of the Eighth International Conference on World Wide Web (WWW’99). 1481--1493.
[39]
V. E. Lee, N. Ruan, R. Jin, and C. Aggarwal. 2010. A survey of algorithms for dense subgraph discovery. In Managing and Mining Graph Data. Vol. 40.
[40]
Jure Leskovec, Kevin J. Lang, Anirban Dasgupta, and Michael W. Mahoney. 2008. Statistical properties of community structure in large social and information networks. In Proceedings of the 17th International Conference on World Wide Web (WWW’08). ACM, New York, NY, 695--704.
[41]
D. Lick and A. White. 1970. k-degenerate graphs. Can. J. Math. 22 (1970), 1082--1096.
[42]
D. Matula and L. Beck. 1983. Smallest-last ordering and clustering and graph coloring algorithms. J. ACM 30, 3 (1983), 417--427.
[43]
R. J. Mokken. 1979. Cliques, clubs and clans. Qual. Quant. 13, 2 (1979), 161--173.
[44]
R. A. Rossi, D. F. Gleich, A. H. Gebremedhin, and Md. M. A. Patwary. 2013. A fast parallel maximum clique algorithm for large sparse graphs and temporal strong components. CoRR abs/1302.6256 (2013).
[45]
K. Saito and T. Yamada. 2006. Extracting communities from complex networks by the k-dense method. In Sixth IEEE International Conference on Data Mining Workshops, 2006 (ICDM Workshops 2006). 300--304.
[46]
A. Sala, L. Cao, C. Wilson, R. Zablit, Haitao Zheng, and Ben Y. Zhao. 2010. Measurement-calibrated graph models for social network experiments. In WWW’10. ACM, 861--870.
[47]
A. E. Sariyüce, B. Gedik, G. Jacques-Silva, K. L. Wu, and Ü. V. Çatalyürek. 2013. Streaming algorithms for k-core decomposition. In Proc. VLDB Endow. 433--444.
[48]
A. E. Sariyüce, C. Seshadhri, A. Pinar, and Ü. V. Çatalyürek. 2015. Finding the hierarchy of dense subgraphs using nucleus decompositions. In Proceedings of the 24th International Conference on World Wide Web (WWW’15). 927--937.
[49]
T. Schank and D. Wagner. 2005. Finding, counting and listing all triangles in large graphs, an experimental study. In Experimental and Efficient Algorithms. 606--609.
[50]
S. B. Seidman. 1983. Network structure and minimum degree. Soc. Netw. 5, 3 (1983), 269--287.
[51]
S. B. Seidman and B. Foster. 1978. A graph-theoretic generalization of the clique concept. J. Math. Sociol. (1978).
[52]
C. Seshadhri, A. Pinar, and T. G. Kolda. 2014. Triadic measures on graphs: The power of wedge sampling. Stat. Anal. Data Min. 7, 4 (2014), 294--307.
[53]
SNAP. retrieved March, 2014. Stanford Network Analysis Package. Retrieved March 2014 http://snap.stanford.edu/snap.
[54]
S. Suri and S. Vassilvitskii. 2011. Counting triangles and the curse of the last reducer. In WWW’11. 607--614.
[55]
N. Tatti and A. Gionis. 2015. Density-friendly graph decomposition. In Proceedings of the 24th International Conference on World Wide Web (WWW’15). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, 1089--1099.
[56]
C. Tsourakakis. 2015. The k-clique densest subgraph problem. In Proceedings of the 24th International Conference on World Wide Web (WWW’15). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, 1122--1132.
[57]
C. Tsourakakis, F. Bonchi, A. Gionis, F. Gullo, and M. Tsiarli. 2013. Denser than the densest subgraph: Extracting optimal quasi-cliques with quality guarantees. In Proc. of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’13).
[58]
J. Wang and J. Cheng. 2012. Truss decomposition in massive networks. Proc. VLDB Endow. 5, 9 (2012), 812--823.
[59]
N. Wang, J. Zhang, K. L. Tan, and A. K. H. Tung. 2010. On triangulation-based dense neighborhood graph discovery. Proc. VLDB Endow. 4 (2010), 58--68.
[60]
S. Wasserman and K. Faust. 1994. Social Network Analysis: Methods and Applications. Cambridge University Press.
[61]
D. Watts and S. Strogatz. 1998. Collective dynamics of ‘small-world’ networks. Nature 393 (1998), 440--442.
[62]
B. Zhang and S. Horvath. 2005. A general framework for weighted gene co-expression network analysis. Stat. Appl. Genet. Molec. Biol. 4, 1 (2005), Article 17+.
[63]
Y. Zhang and S. Parthasarathy. 2012. Extracting analyzing and visualizing triangle k-core motifs within networks. In Proc. of the 2012 IEEE 28th International Conference on Data Engineering (ICDE’12). 1049--1060.
[64]
F. Zhao and A. K. H. Tung. 2013. Large scale cohesive subgraphs discovery for social network visual analysis. In Proc. VLDB Endow. 85--96.

Cited By

View all
  • (2024)A Survey on the Densest Subgraph Problem and its VariantsACM Computing Surveys10.1145/365329856:8(1-40)Online publication date: 22-Mar-2024
  • (2024)Parallel Algorithms for Hierarchical Nucleus DecompositionProceedings of the ACM on Management of Data10.1145/36392872:1(1-27)Online publication date: 26-Mar-2024
  • (2024)I/O Efficient Max-Truss Computation in Large Static and Dynamic Graphs2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00249(3217-3229)Online publication date: 13-May-2024
  • Show More Cited By

Index Terms

  1. Nucleus Decompositions for Identifying Hierarchy of Dense Subgraphs

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on the Web
    ACM Transactions on the Web  Volume 11, Issue 3
    August 2017
    209 pages
    ISSN:1559-1131
    EISSN:1559-114X
    DOI:10.1145/3113174
    Issue’s Table of Contents
    © 2017 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the United States Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 03 July 2017
    Accepted: 01 January 2017
    Revised: 01 December 2016
    Received: 01 November 2015
    Published in TWEB Volume 11, Issue 3

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. k-core
    2. k-truss
    3. dense subgraph discovery
    4. density hierarchy
    5. graph decomposition
    6. overlapping dense subgraphs

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    • wholly owned subsidiary of Lockheed Martin Corporation
    • Laboratory Directed Research and Development (LDRD) Program of Sandia National Laboratories
    • U.S. Department of Energy's National Nuclear Security Administration
    • DARPA GRAPHS program
    • Sandia Corporation
    • DOE Applied Mathematics Research Program

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)20
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 15 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)A Survey on the Densest Subgraph Problem and its VariantsACM Computing Surveys10.1145/365329856:8(1-40)Online publication date: 22-Mar-2024
    • (2024)Parallel Algorithms for Hierarchical Nucleus DecompositionProceedings of the ACM on Management of Data10.1145/36392872:1(1-27)Online publication date: 26-Mar-2024
    • (2024)I/O Efficient Max-Truss Computation in Large Static and Dynamic Graphs2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00249(3217-3229)Online publication date: 13-May-2024
    • (2023)Parallel Colorful h-Star Core Maintenance in Dynamic GraphsProceedings of the VLDB Endowment10.14778/3603581.360359316:10(2538-2550)Online publication date: 8-Aug-2023
    • (2023)Theoretically and Practically Efficient Parallel Nucleus Decomposition (Abstract)Proceedings of the 2023 ACM Workshop on Highlights of Parallel Computing10.1145/3597635.3598024(7-8)Online publication date: 18-Jul-2023
    • (2023)Balanced Clique Computation in Signed Networks: Concepts and AlgorithmsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.322556235:11(11079-11092)Online publication date: 1-Nov-2023
    • (2023)Finding the Maximum $k$-Balanced Biclique on Weighted Bipartite GraphsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.320635135:8(7994-8007)Online publication date: 1-Aug-2023
    • (2022)Theoretically and practically efficient parallel nucleus decompositionProceedings of the VLDB Endowment10.14778/3494124.349414015:3(583-596)Online publication date: 4-Feb-2022
    • (2022)Colorful h-star Core Decomposition2022 IEEE 38th International Conference on Data Engineering (ICDE)10.1109/ICDE53745.2022.00239(2588-2601)Online publication date: May-2022
    • (2022)Efficient $k-\text{clique}$ Listing with Set Intersection Speedup2022 IEEE 38th International Conference on Data Engineering (ICDE)10.1109/ICDE53745.2022.00192(1955-1968)Online publication date: May-2022
    • Show More Cited By

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media