skip to main content
10.1145/1458469.1458477acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Peer-to-peer similarity search over widely distributed document collections

Published: 30 October 2008 Publication History

Abstract

This paper addresses the challenging problem of similarity search over widely distributed ultra-high dimensional data. Such an application is retrieval of the top-k most similar documents in a widely distributed document collection, as in the case of digital libraries. Peer-to-peer (P2P) systems emerge as a promising solution to delve with content management in cases of highly distributed data collections. We propose a self-organizing P2P approach in which an unstructured P2P network evolves into a super-peer architecture, with super-peers responsible for peers with similar content. Our approach is based on distributed clustering of peer contents, thus managing to create high quality clusters that span the entire network. More importantly, we show how to efficiently process similarity queries capitalizing on the newly constructed, clustered super-peer network. During query processing, the query is propagated only to few carefully selected super-peers that are able to return results of high quality. We evaluate the performance of our approach and demonstrate its advantages through simulation experiments on two document collections.

References

[1]
K. Aberer, P. Cudré-Mauroux, M. Hauswirth, and T. V. Pelt. Gridvine: Building Internet-Scale Semantic Overlay Networks. In Proceedings of ISWC'2004, 2004.
[2]
V. Cholvi, P. Felber, and E. Biersack. Efficient search in unstructured peer-to-peer networks. Technical report, Institut EURECOM, 2003.
[3]
A. Crespo and H. Garcia-Molina. Semantic overlay networks for P2P systems. In Proceedings of AP2PC'04, 2004.
[4]
F. Cuenca-Acuna, C. Peery, R. Martin, and T. Nguyen. PlanetP: Using gossiping to build content addressable peer-to-peer information sharing communities. In Proceedings of HPDC'03, 2003.
[5]
I. S. Dhillon and D. S. Modha. Concept decompositions for large sparse text data using clustering. Machine Learning, 42(1):143--175, Jan 2001.
[6]
C. Doulkeridis, K. Nørvåg, and M. Vazirgiannis. DESENT: Decentralized and distributed semantic overlay generation in P2P networks. IEEE Journal on Selected Areas in Communications (J-SAC), 25(1):25--34, 2007.
[7]
C. Doulkeridis, A. Vlachou, Y. Kotidis, and M. Vazirgiannis. Peer-to-peer similarity search in metric spaces. In Proceedings of VLDB'07, 2007.
[8]
P. Garbacki, D. H. J. Epema, and M. van Steen. Optimizing peer relationships in a super-peer network. In Proceedings of ICDCS'07, 2007.
[9]
C. Gennaro, M. Mordacchini, S. Orlando, and F. Rabitti. Processing complex similarity queries in peer-to-peer networks. In Proceedings of SAC '2008, 2008.
[10]
C. Gkantsidis, M. Mihail, and A. Saberi. Hybrid search schemes for unstructured peer-to-peer networks. In Proceedings of INFOCOM'05, 2005.
[11]
A. Linari and G. Weikum. Efficient peer-to-peer semantic overlay networks based on statistical language models. In Proceedings of P2PIR'06, 2006.
[12]
F. Liu, M. Li, and L. Huang. Distributed information retrieval based on hierarchical semantic overlay network. In Proceedings of GCC'04, 2004.
[13]
J. Lu and J. Callan. Full-text federated search of text-based digital libraries in peer-to-peer networks. Information Retrieval, 9(4):477--498, 2006.
[14]
J. Lv and X. Cheng. WonGoo: A pure peer-to-peer full text information retrieval system based on semantic overlay networks. In Proceedings of NCA'04, 2004.
[15]
S. Michel, P. Triantafillou, and G. Weikum. MINERVA Infinity: A Scalable Efficient Peer-to-Peer Search Engine. In Proceedings of Middleware'05, 2005.
[16]
W. Nejdl et al. Super-Peer-based Routing and Clustering Strategies for RDF-based P2P Networks. In Proceedings of WWW'03, 2003.
[17]
D. Novak and P. Zezula. M-chord: A scalable distributed similarity search structure. In Proceedings of INFOSCALE'06, 2006.
[18]
J. X. Parreira, S. Michel, and G. Weikum. p2pDating: Real Life Inspired Semantic Overlay Networks for Web Search. In Proceedings of SIGIR'2005 HDIR Workshop, 2005.
[19]
S. Ratnasamy, P. Francis, M. Handley, R. Karp, and S. Schenker. A Scalable Content-addressable Network. In Proceedings of SIGCOMM'01, 2001.
[20]
O. D. Sahin, F. Emekci, D. Agrawal, and A. E. Abbadi. Content-based similarity search over peer-to-peer systems. In Proceedings of DBISP2P'04, 2004.
[21]
G. Skobeltsyn, T. Luu, I. P. Zarko, M. Rajman, and K. Aberer. Query-driven indexing for scalable peer-to-peer text retrieval. In Proceedings of Infoscale'2007, 2007.
[22]
T. Suel et al. ODISSEA: A Peer-to-Peer Architecture for Scalable Web Search and Information Retrieval. In Proceedings of WebDB'2003, 2003.
[23]
C. Tang and S. Dwarkadas. Hybrid global-local indexing for efficient peer-to-peer information retrieval. In Proceedings of NSDI'04, 2004.
[24]
C. Tang, Z. Xu, and S. Dwarkadas. Peer-to-Peer Information Retrieval Using Self-Organizing Semantic Overlay Networks. In Proceedings of SIGCOMM'03, 2003.
[25]
C. Tempich, S. Staab, and A. Wranik. REMINDIN?: Semantic Query Routing in Peer-to-Peer Networks based on Social Metaphors. In Proceedings of WWW'2004, 2004.
[26]
J. Zhang and T. Suel. Efficient query evaluation on large textual collections in a peer-to-peer environment. In Proceedings of IEEE P2P'05, 2005.

Cited By

View all
  • (2020)Improving the Performance of kNN in the MapReduce Framework Using Locality Sensitive HashingInternational Journal of Distributed Systems and Technologies10.4018/IJDST.201910010110:4(1-16)Online publication date: 1-Oct-2020
  • (2018)Visual Analysis of Distributed Search Traffic in a Peer-to-peer Network2018 10th International Conference on Communication Software and Networks (ICCSN)10.1109/ICCSN.2018.8488222(189-194)Online publication date: Jul-2018
  • (2018)Visual Analysis of Distributed Search Traffic in a Peer-to-peer Network2018 13th APCA International Conference on Control and Soft Computing (CONTROLO)10.1109/CONTROLO.2018.8439791(189-194)Online publication date: Jun-2018
  • Show More Cited By

Index Terms

  1. Peer-to-peer similarity search over widely distributed document collections

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    LSDS-IR '08: Proceedings of the 2008 ACM workshop on Large-Scale distributed systems for information retrieval
    October 2008
    90 pages
    ISBN:9781605582542
    DOI:10.1145/1458469
    • Program Chairs:
    • Sebastian Michel,
    • Gleb Skobeltsyn,
    • Wai Gen Yee
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 30 October 2008

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. distributed and peer-to-peer search
    2. semantic overlay networks

    Qualifiers

    • Research-article

    Conference

    CIKM08
    CIKM08: Conference on Information and Knowledge Management
    October 30, 2008
    California, Napa Valley, USA

    Acceptance Rates

    Overall Acceptance Rate 3 of 5 submissions, 60%

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 14 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2020)Improving the Performance of kNN in the MapReduce Framework Using Locality Sensitive HashingInternational Journal of Distributed Systems and Technologies10.4018/IJDST.201910010110:4(1-16)Online publication date: 1-Oct-2020
    • (2018)Visual Analysis of Distributed Search Traffic in a Peer-to-peer Network2018 10th International Conference on Communication Software and Networks (ICCSN)10.1109/ICCSN.2018.8488222(189-194)Online publication date: Jul-2018
    • (2018)Visual Analysis of Distributed Search Traffic in a Peer-to-peer Network2018 13th APCA International Conference on Control and Soft Computing (CONTROLO)10.1109/CONTROLO.2018.8439791(189-194)Online publication date: Jun-2018
    • (2018)Multi-agent Models Solution to Achieve EMC In Wireless Telecommunication Systems2018 1st Annual International Conference on Information and Sciences (AiCIS)10.1109/AiCIS.2018.00061(311-314)Online publication date: Nov-2018
    • (2017)Distributed Search Efficiency and Robustness in Service oriented Multi-agent NetworksProceedings of the 2017 International Conference on Management Engineering, Software Engineering and Service Sciences10.1145/3034950.3034975(9-18)Online publication date: 14-Jan-2017
    • (2016)Scalability analysis of distributed search in large peer-to-peer networks2016 IEEE International Conference on Big Data (Big Data)10.1109/BigData.2016.7840686(909-914)Online publication date: Dec-2016
    • (2013)Studying the clustering paradox and scalability of search in highly distributed environmentsACM Transactions on Information Systems10.1145/2457465.245746831:2(1-36)Online publication date: 17-May-2013
    • (2012)Decentralized Search and the Clustering Paradox in Large Scale Information NetworksNext Generation Search Engines10.4018/978-1-4666-0330-1.ch002(29-46)Online publication date: 2012
    • (2012)An efficient search mechanism for supporting partial filename queries in structured peer-to-peer overlayPeer-to-Peer Networking and Applications10.1007/s12083-012-0139-55:4(340-349)Online publication date: 15-May-2012
    • (2011)Performance Evaluation for DSQRM: A Domain-Based Query Routing Mechanism for P2P NetworksDigital Enterprise and Information Systems10.1007/978-3-642-22603-8_29(317-330)Online publication date: 2011
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media