Abstract
Fast similarity retrieval for high-dimensional unstructured data is becoming significantly important. In high-dimensional space, traditional tree-based index is incompetent comparing with hashing methods. As a state-of-the-art hashing approach, Spectral Hashing (SH) aims at designing compact binary codes for high-dimensional vectors so that the similarity structure of original vector space can be preserved in the code space. We propose a generic high-dimensional index framework named LuSH in this paper, which means Lucenebased SH. It uses SH as high-dimensional index and Lucene, the well-known open source inverted index, as underlying index file. To speedup retrieval efficiency, two improvement strategies are proposed. Experiments on large scale datasets containing up to 10 million data show significant performance of our LuSH framework.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Tamura, H., Mori, S., Yamawaki, T.: Textural features corresponding to visual perception. IEEE Transactions on Systems, Man, and Cybernetics 8(6), 460–472 (1978)
Chatzichristofis, S.A., Boutalis, Y.S.: CEDD: Color and Edge Directivity Descriptor: A Compact Descriptor for Image Indexing and Retrieval. In: Gasteratos, A., Vincze, M., Tsotsos, J.K. (eds.) ICVS 2008. LNCS, vol. 5008, pp. 312–322. Springer, Heidelberg (2008)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision (IJCV) 60(2), 91–110 (2004)
Salakhutdinov, R., Hinton, G.: Semantic hashing. International Journal of Approximate Reasoning 50(7), 969–978 (2009)
Apache Lucene Project, http://lucene.apache.org/
Guattman, A.: R-Tree: A Dynamic Index Structure for Spatial Searching. In: ACM SIGMOD Int. Conf. on Management of Data, Boston, pp. 47–57 (1984)
Bentley, J.L.: Multidimensional binary search Trees used for associative searching. Communications of the ACM 18(9), 509–517 (1975)
Bellman, R.: Adaptive control processes: a guided tour. Princeton University Press (1961)
Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing, pp. 604–613. ACM, New York (1998)
Andoni, A., Indyk, P.: Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In: FOCS, pp. 459–468 (2006)
Salakhutdinov, R., Hinton, G.: Learning a nonlinear embedding by preserving class neighbourhood structure. In: AI and Statistics, p. 5 (2007)
Shakhnarovich, G., Viola, P., Darrell, T.: Fast pose estimation with parameter sensitive hashing. In: ICCV, pp. 750–757 (2003)
Weiss, Y., Torralba, A., Fergus, R.: Spectral hashing. In: Advances in Neural Information Processing Systems, pp. 1753–1760 (2009)
Bengio, Y., Delalleau, O., Roux, N., et al.: Learning eigenfunctions links spectral embedding and kernel PCA. Neural Computation 16(10), 2197–2219 (2004)
Coifman, R., Lafon, S., Lee, A., et al.: Geometric diffusions as a tool for harmonic analysis and structure definition of data: Diffusion maps. Proc. of the National Academy of Sciences of the United States of America 102(21), 7426 (2005)
Belkin, M., Niyogi, P.: Towards a theoretical foundation for Laplacian-based manifold methods. Journal of Computer and System Sciences 74(8), 1289–1308 (2008)
Comer, D.: Ubiquitous B-tree. ACM Computing Surveys (CSUR) 11(2), 121–137 (1979)
SogouP2.0, http://www.sogou.com/labs/dl/p2.html
Ng, A.Y., Jordan, M.I., Weiss, Y.: On Spectral Clustering: Analysis and an algorithm. Journal of Advances in Neural Information Processing Systems 2, 849–856 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Yu, Z., Shao, J., Wu, F. (2012). LuSH: A Generic High-Dimensional Index Framework. In: Bao, Z., et al. Web-Age Information Management. WAIM 2012. Lecture Notes in Computer Science, vol 7419. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33050-6_19
Download citation
DOI: https://doi.org/10.1007/978-3-642-33050-6_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33049-0
Online ISBN: 978-3-642-33050-6
eBook Packages: Computer ScienceComputer Science (R0)