skip to main content
research-article
Public Access

SNAP: A General-Purpose Network Analysis and Graph-Mining Library

Published: 15 July 2016 Publication History
  • Get Citation Alerts
  • Abstract

    Large networks are becoming a widely used abstraction for studying complex systems in a broad set of disciplines, ranging from social-network analysis to molecular biology and neuroscience. Despite an increasing need to analyze and manipulate large networks, only a limited number of tools are available for this task.
    Here, we describe the Stanford Network Analysis Platform (SNAP), a general-purpose, high-performance system that provides easy-to-use, high-level operations for analysis and manipulation of large networks. We present SNAP functionality, describe its implementational details, and give performance benchmarks. SNAP has been developed for single big-memory machines, and it balances the trade-off between maximum performance, compact in-memory graph representation, and the ability to handle dynamic graphs in which nodes and edges are being added or removed over time. SNAP can process massive networks with hundreds of millions of nodes and billions of edges. SNAP offers over 140 different graph algorithms that can efficiently manipulate large graphs, calculate structural properties, generate regular and random graphs, and handle attributes and metadata on nodes and edges. Besides being able to handle large graphs, an additional strength of SNAP is that networks and their attributes are fully dynamic; they can be modified during the computation at low cost. SNAP is provided as an open-source library in C++ as well as a module in Python.
    We also describe the Stanford Large Network Dataset, a set of social and information real-world networks and datasets, which we make publicly available. The collection is a complementary resource to our SNAP software and is widely used for development and benchmarking of graph analytics algorithms.

    References

    [1]
    S. Arifuzzaman, M. Khan, and M. Marathe. 2013. PATRIC: A parallel algorithm for counting triangles in massive networks. In ACM International Conference on Information and Knowledge Management (CIKM). 529--538.
    [2]
    A.-L. Barabási and R. Albert. 1999. Emergence of scaling in random networks. Science 286, 5439 (1999), 509--512.
    [3]
    V. Batagelj and A. Mrvar. 1998. Pajek-program for large network analysis. Connections 21, 2 (1998), 47--57.
    [4]
    V. Batagelj and M. Zaveršnik. 2002. Generalized cores. ArXiv cs.DS/0202039 (Feb 2002).
    [5]
    A. A. Benczur, K. Csalogany, T. Sarlos, and M. Uher. 2005. Spamrank--fully automatic link spam detection. In International Workshop on Adversarial Information Retrieval on the Web.
    [6]
    B. Bollobás. 1980. A probabilistic proof of an asymptotic formula for the number of labelled regular graphs. European Journal of Combinatorics 1, 4 (1980), 311--316.
    [7]
    D. Chakrabarti, Y. Zhan, and C. Faloutsos. 2004. R-MAT: A recursive model for graph mining. In SIAM International Conference on Data Mining (SDM), Vol. 4. SIAM, 442--446.
    [8]
    G. Csardi and T. Nepusz. 2006. The igraph software package for complex network research. InterJournal, Complex Systems 1695, 5 (2006).
    [9]
    D. Easley and J. Kleinberg. 2010. Networks, Crowds, and Markets: Reasoning About a Highly Connected World. Cambridge University Press.
    [10]
    A. D. Flaxman, A. M. Frieze, and J. Vera. 2006. A geometric preferential attachment model of networks. Internet Mathematics 3, 2 (2006), 187--205.
    [11]
    M. Gomez-Rodriguez, J. Leskovec, D. Balduzzi, and B. Schölkopf. 2014. Uncovering the structure and temporal dynamics of information propagation. Network Science 2, 01 (2014), 26--65.
    [12]
    M. Gomez-Rodriguez, J. Leskovec, and A. Krause. 2010. Inferring networks of diffusion and influence. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). ACM, 1019--1028.
    [13]
    M. Gomez-Rodriguez, J. Leskovec, and A. Krause. 2012. Inferring networks of diffusion and influence. ACM Transactions on Knowledge Discovery from Data 5, 4, Article 21 (Feb. 2012), 37 pages.
    [14]
    M. Gomez-Rodriguez, J. Leskovec, and B. Schölkopf. 2013. Structure and dynamics of information pathways in online media. In ACM International Conference on Web Search and Data Mining (WSDM). ACM, 23--32.
    [15]
    J. E. Gonzalez, Y. Low, H. Gu, D. Bickson, and C. Guestrin. 2012. PowerGraph: Distributed graph-parallel computation on natural graphs. In USENIX Symposium on Operating Systems Design and Implementation (OSDI), Vol. 12. 2.
    [16]
    D. Gregor and A. Lumsdaine. 2005. The parallel BGL: A generic library for distributed graph computations. Parallel Object-Oriented Scientific Computing (POOSC) 2 (2005), 1--18.
    [17]
    A. Hagberg, P. Swart, and D. S. Chult. 2008. Exploring Network Structure, Dynamics, and Function using NetworkX. Technical Report. Los Alamos National Laboratory (LANL).
    [18]
    D. Hallac, J. Leskovec, and S. Boyd. 2015. Network lasso: Clustering and optimization in large graphs. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). ACM, 387--396.
    [19]
    M. O. Jackson. 2008. Social and Economic Networks. Vol. 3. Princeton university press Princeton.
    [20]
    U. Kang, C. E. Tsourakakis, and C. Faloutsos. 2009. Pegasus: A peta-scale graph mining system implementation and observations. In IEEE International Conference on Data Mining (ICDM). IEEE, 229--238.
    [21]
    J. Kim, W.-S. Han, S. Lee, K. Park, and H. Yu. 2014. OPT: A new framework for overlapped and parallel triangulation in large-scale graphs. In ACM SIGMOD International Conference on Management of Data (SIGMOD). ACM, 637--648.
    [22]
    M. Kim and J. Leskovec. 2011a. Modeling social networks with node attributes using the multiplicative attribute graph model. In Conference on Uncertainty in Artificial Intelligence (UAI).
    [23]
    M. Kim and J. Leskovec. 2011b. The network completion problem: Inferring missing nodes and edges in networks. In SIAM International Conference on Data Mining (SDM). 47--58.
    [24]
    M. Kim and J. Leskovec. 2012a. Latent multi-group membership graph model. In International Conference on Machine Learning (ICML).
    [25]
    M. Kim and J. Leskovec. 2012b. Multiplicative attribute graph model of real-world networks. Internet Mathematics 8, 1--2 (2012), 113--160.
    [26]
    M. Kim and J. Leskovec. 2013. Nonparametric multi-group membership model for dynamic networks. In Advances in Neural Information Processing Systems (NIPS). 1385--1393.
    [27]
    R. Kumar, P. Raghavan, S. Rajagopalan, D. Sivakumar, A. Tomkins, and E. Upfal. 2000. Stochastic models for the web graph. In Annual Symposium on Foundations of Computer Science. IEEE, 57--65.
    [28]
    H. Kwak, C. Lee, H. Park, and S. Moon. 2010. What is twitter, a social network or a news media?. In WWW’10.
    [29]
    A. Kyrola, G. Blelloch, and C. Guestrin. 2012. GraphChi: Large-scale graph computation on just a PC. In USENIX Symposium on Operating Systems Design and Implementation (OSDI). 31--46.
    [30]
    J. Leskovec, L. Backstrom, and J. Kleinberg. 2009. Meme-tracking and the dynamics of the news cycle. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). 497--506.
    [31]
    J. Leskovec, D. Chakrabarti, J. Kleinberg, C. Faloutsos, and Z. Ghahramani. 2010. Kronecker graphs: An approach to modeling networks. Journal of Machine Learning Research 11 (2010), 985--1042.
    [32]
    J. Leskovec and E. Horvitz. 2014. Geospatial structure of a planetary-scale social network. IEEE Transactions on Computational Social Systems 1, 3 (2014), 156--163.
    [33]
    J. Leskovec, J. Kleinberg, and C. Faloutsos. 2005. Graphs over time: Densification laws, shrinking diameters and possible explanations. In ACM SIGKDD International Conference on Knowledge Discovery in Data Mining (KDD). ACM, 177--187.
    [34]
    J. Leskovec and A. Krevl. 2014. SNAP Datasets: Stanford Large Network Dataset Collection. http://snap.stanford.edu/data. (June 2014).
    [35]
    P. Lofgren, S. Banerjee, and A. Goel. 2016. Personalized PageRank estimation and search: A bidirectional approach. In ACM International Conference on Web Search and Data Mining (WSDM). ACM.
    [36]
    P. A. Lofgren, S. Banerjee, A. Goel, and C Seshadhri. 2014. Fast-ppr: Scaling personalized pagerank estimation for large graphs. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). ACM, 1436--1445.
    [37]
    G. Malewicz, M. H. Austern, A. J. C. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. 2010. Pregel: A system for large-scale graph processing. In ACM SIGMOD International Conference on Management of Data (SIGMOD). ACM, 135--146.
    [38]
    J. McAuley and J. Leskovec. 2012. Learning to discover social circles in ego networks. In Advances in Neural Information Processing Systems (NIPS).
    [39]
    J. McAuley and J. Leskovec. 2014. Discovering social circles in ego networks. ACM Transactions on Knowledge Discovery from Data 8, 1, Article 4 (Feb. 2014), 28 pages.
    [40]
    R. Milo, N. Kashtan, S. Itzkovitz, M. E. J. Newman, and U. Alon. 2003. On the uniform generation of random graphs with prescribed degree sequences. arXiv preprint cond-mat/0312028 (2003).
    [41]
    M. Newman. 2003. The structure and function of complex networks. SIAM Rev. 45, 2 (2003), 167--256.
    [42]
    M. Newman. 2010. Networks: An Introduction. OUP Oxford.
    [43]
    J. O’Madadhain, D. Fisher, P. Smyth, S. White, and Y. Boey. 2005. Analysis and visualization of network data using JUNG. Journal of Statistical Software 10, 2 (2005), 1--35.
    [44]
    L. Page, S. Brin, R. Motwani, and T. Winograd. November 1999. The Pagerank Citation Ranking: Bringing Order to the Web. Technical Report. Stanford InfoLab.
    [45]
    Y. Perez, R. Sosič, A. Banerjee, R. Puttagunta, M. Raison, P. Shah, and J. Leskovec. 2015. Ringo: Interactive graph analytics on big-memory machines. In ACM SIGMOD International Conference on Management of Data (SIGMOD). 1105--1110.
    [46]
    E. Ravasz and A.-L. Barabási. 2003. Hierarchical organization in complex networks. Physical Review E 67, 2 (2003), 026112.
    [47]
    S. Salihoglu and J. Widom. 2013. GPS: A graph processing system. In International Conference on Scientific and Statistical Database Management. ACM, 22.
    [48]
    C. Suen, S. Huang, C. Eksombatchai, R. Sosič, and J. Leskovec. 2013. NIFTY: A system for large scale information flow tracking and clustering. In International Conference on World Wide Web (WWW). International World Wide Web Conferences Steering Committee, 1237--1248.
    [49]
    D. J. Watts and S. H. Strogatz. 1998. Collective dynamics of small-world networks. Nature 393, 6684 (1998), 440--442.
    [50]
    R. S. Xin, J. E. Gonzalez, M. J. Franklin, and I. Stoica. 2013. GraphX: A resilient distributed graph system on Spark. In ACM International Workshop on Graph Data Management Experiences and Systems. ACM,  2.
    [51]
    J. Yang and J. Leskovec. 2012. Community-affiliation graph model for overlapping network community detection. In IEEE International Conference on Data Mining (ICDM). IEEE, 1170--1175.
    [52]
    J. Yang and J. Leskovec. 2013. Overlapping community detection at scale: A nonnegative matrix factorization approach. In ACM International Conference on Web Search and Data Mining (WSDM). ACM, 587--596.
    [53]
    J. Yang and J. Leskovec. 2014. Overlapping communities explain core-periphery organization of networks. Proc. IEEE 102, 12 (Dec 2014), 1892--1902.
    [54]
    J. Yang, J. McAuley, and J. Leskovec. 2013. Community detection in networks with node attributes. In IEEE International Conference on Data Mining (ICDM).
    [55]
    J. Yang, J. McAuley, and J. Leskovec. 2014. Detecting cohesive and 2-mode communities in directed and undirected networks. In ACM International Conference on Web Search and Data Mining (WSDM).

    Cited By

    View all
    • (2024)Viral Marketing in Social Networks with Competing ProductsProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3663069(2047-2056)Online publication date: 6-May-2024
    • (2024)Bioinformatics Prediction for Network-Based Integrative Multi-Omics Expression Data Analysis in Hirschsprung DiseaseBiomolecules10.3390/biom1402016414:2(164)Online publication date: 30-Jan-2024
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Intelligent Systems and Technology
    ACM Transactions on Intelligent Systems and Technology  Volume 8, Issue 1
    January 2017
    363 pages
    ISSN:2157-6904
    EISSN:2157-6912
    DOI:10.1145/2973184
    • Editor:
    • Yu Zheng
    Issue’s Table of Contents
    Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 15 July 2016
    Accepted: 01 February 2016
    Received: 01 February 2016
    Published in TIST Volume 8, Issue 1

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Networks
    2. data mining
    3. graph analytics
    4. graphs
    5. open-source software

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)680
    • Downloads (Last 6 weeks)60

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Viral Marketing in Social Networks with Competing ProductsProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3663069(2047-2056)Online publication date: 6-May-2024
    • (2024)Bioinformatics Prediction for Network-Based Integrative Multi-Omics Expression Data Analysis in Hirschsprung DiseaseBiomolecules10.3390/biom1402016414:2(164)Online publication date: 30-Jan-2024
    • (2024)Genomic analysis based on chromosome-level genome assembly reveals Myrtaceae evolution and terpene biosynthesis of rose myrtleBMC Genomics10.1186/s12864-024-10509-625:1Online publication date: 10-Jun-2024
    • (2024)Understanding High-Performance Subgraph Pattern Matching: A Systems PerspectiveProceedings of the 7th Joint Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA)10.1145/3661304.3661897(1-12)Online publication date: 14-Jun-2024
    • (2024)G2A2: Graph Generator with Attributes and AnomaliesProceedings of the 21st ACM International Conference on Computing Frontiers10.1145/3649153.3649206(3-11)Online publication date: 7-May-2024
    • (2024)HITS-based Propagation Paradigm for Graph Neural NetworksACM Transactions on Knowledge Discovery from Data10.1145/363877918:4(1-23)Online publication date: 13-Feb-2024
    • (2024)Go-Network: a graph sampling library written in GoCompanion of the 15th ACM/SPEC International Conference on Performance Engineering10.1145/3629527.3652903(151-155)Online publication date: 7-May-2024
    • (2024)Efficient Computation for Diagonal of Forest Matrix via Variance-Reduced Forest SamplingProceedings of the ACM on Web Conference 202410.1145/3589334.3645578(792-802)Online publication date: 13-May-2024
    • (2024)The Impact of Centrality Measures in Protein–Protein Interaction Networks: Tools, Databases, Challenges and Future DirectionsJournal of Computational Biophysics and Chemistry10.1142/S273741652440007623:06(815-836)Online publication date: 3-Jul-2024
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media