LSH Algorithm and
Implementation (E2LSH)
Locality-Sensitive Hashing (LSH)
is an algorithm for solving the approximate or exact
Near Neighbor Search in high dimensional spaces. This webpage
links to the newest LSH algorithms in Euclidean and
Hamming spaces, as well as
the E2LSH package, an implementation of an
early practical LSH algorithm.
Check out also the
2015--2016 FALCONN package,
which is a package based on newer ideas (namely the NIPS'15 paper
below).
-
Algorithm description:
-
Newest, data-dependent LSH algorithms (2015):
These algorithms achieve performance better than the classic LSH algorithms by
using data-dependent hashing. They improve over classic LSH
algorithms for both Hamming and Euclidean space. These algorithms are
not dynamic however, in contrast to the classic LSH algorithms, which
use data-independent hashing and hence allow updates to the pointset.
Practical and Optimal
LSH
for Angular Distance (with Piotr Indyk, Thijs Laarhoven,
Ilya
Razenshteyn and Ludwig Schmidt). In NIPS'15.
Optimal
Data-Dependent Hashing for Approximate Near Neighbors
(by Alexandr Andoni and Ilya Razenshteyn). In STOC'15. Full version in arXiv:1501.01062.
Slides: Here are some slides by Alexandr Andoni. Also, you can
see video of a talk.
Beyond
Locality Sensitive Hashing
(by Alexandr Andoni, Piotr Indyk, Huy L. Nguyen, and Ilya Razenshteyn). In SODA'14.
-
Survey of LSH in CACM (2008):
"Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in
High Dimensions" (by Alexandr Andoni and Piotr Indyk).
Communications of the ACM, vol. 51, no. 1, 2008, pp. 117-122.
(CACM disclaimer).
also available directly from
CACM (for free).
-
Most Not-so-recent algorithm for Euclidean space (2006):
"Near-Optimal
Hashing Algorithms for Near Neighbor Problem in High Dimensions"
(by Alexandr Andoni and Piotr Indyk). In FOCS'06.
Slides
on this LSH algorithm from a talk given by Piotr Indyk.
-
Earlier algorithm for Euclidean
space (2006): a good introduction to LSH, and the description of
affairs as of 2006, is in the following book chapter
Locality-Sensitive
Hashing Scheme Based on p-Stable
Distributions (by Alexandr Andoni, Mayur Datar, Nicole Immorlica,
Piotr Indyk, and Vahab Mirrokni), appearing in the book
Nearest
Neighbor Methods in Learning and
Vision: Theory and Practice,
by T. Darrell and P. Indyk and G. Shakhnarovich (eds.), MIT Press, 2006.
See also the book
introduction for a smooth introduction to NN problem and LSH.
-
Original LSH algorithm (1999):
the best algorithm for the Hamming space remains
previous version of the algorithm for the
Hamming distance is described in [GIM'99]
paper.
-
Implementations of LSH:
older version of LSH is available as the E2LSH package (alpha-version). The code is based on the algorithm
described in the
book chapter (2006) from above. You can download the manual for the code. The code has been developed by Alex Andoni in 2004-2005.
There is a newer version (based on the NIPS'15 paper),
called FALCONN, which is
designed and implemented by Ilya Razenshteyn and Ludwig Schmidt (2015-2016).
This research was supported in part by NSF CAREER Grant #0133849 "Approximate
Algorithms for High-dimensional Geometric Problems".