Optimal Data-Dependent Hashing for Approximate Near Neighbors

Andoni, Alexandr; Razenshteyn, Ilya

Computer Science > Data Structures and Algorithms

arXiv:1501.01062 (cs)

[Submitted on 6 Jan 2015 (v1), last revised 16 Jul 2015 (this version, v3)]

Title:Optimal Data-Dependent Hashing for Approximate Near Neighbors

Authors:Alexandr Andoni, Ilya Razenshteyn

View PDF

Abstract:We show an optimal data-dependent hashing scheme for the approximate near neighbor problem. For an $n$-point data set in a $d$-dimensional space our data structure achieves query time $O(d n^{\rho+o(1)})$ and space $O(n^{1+\rho+o(1)} + dn)$, where $\rho=\tfrac{1}{2c^2-1}$ for the Euclidean space and approximation $c>1$. For the Hamming space, we obtain an exponent of $\rho=\tfrac{1}{2c-1}$.
Our result completes the direction set forth in [AINR14] who gave a proof-of-concept that data-dependent hashing can outperform classical Locality Sensitive Hashing (LSH). In contrast to [AINR14], the new bound is not only optimal, but in fact improves over the best (optimal) LSH data structures [IM98,AI06] for all approximation factors $c>1$.
From the technical perspective, we proceed by decomposing an arbitrary dataset into several subsets that are, in a certain sense, pseudo-random.

Comments:	36 pages, 5 figures, an extended abstract appeared in the proceedings of the 47th ACM Symposium on Theory of Computing (STOC 2015)
Subjects:	Data Structures and Algorithms (cs.DS)
Cite as:	arXiv:1501.01062 [cs.DS]
	(or arXiv:1501.01062v3 [cs.DS] for this version)
	https://doi.org/10.48550/arXiv.1501.01062

Submission history

From: Ilya Razenshteyn [view email]
[v1] Tue, 6 Jan 2015 02:21:59 UTC (207 KB)
[v2] Wed, 18 Mar 2015 04:12:39 UTC (210 KB)
[v3] Thu, 16 Jul 2015 03:37:53 UTC (210 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.DS

< prev | next >

new | recent | 2015-01

Change to browse by:

References & Citations

DBLP - CS Bibliography

listing | bibtex

Alexandr Andoni
Ilya Razenshteyn
Ilya P. Razenshteyn

export BibTeX citation

Computer Science > Data Structures and Algorithms

Title:Optimal Data-Dependent Hashing for Approximate Near Neighbors

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Data Structures and Algorithms

Title:Optimal Data-Dependent Hashing for Approximate Near Neighbors

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators