Visualizing Large-scale and High-dimensional Data

Tang, Jian; Liu, Jingzhou; Zhang, Ming; Mei, Qiaozhu

doi:10.1145/2872427.2883041

Computer Science > Machine Learning

arXiv:1602.00370 (cs)

[Submitted on 1 Feb 2016 (v1), last revised 5 Apr 2016 (this version, v2)]

Title:Visualizing Large-scale and High-dimensional Data

Authors:Jian Tang, Jingzhou Liu, Ming Zhang, Qiaozhu Mei

View PDF

Abstract:We study the problem of visualizing large-scale and high-dimensional data in a low-dimensional (typically 2D or 3D) space. Much success has been reported recently by techniques that first compute a similarity structure of the data points and then project them into a low-dimensional space with the structure preserved. These two steps suffer from considerable computational costs, preventing the state-of-the-art methods such as the t-SNE from scaling to large-scale and high-dimensional data (e.g., millions of data points and hundreds of dimensions). We propose the LargeVis, a technique that first constructs an accurately approximated K-nearest neighbor graph from the data and then layouts the graph in the low-dimensional space. Comparing to t-SNE, LargeVis significantly reduces the computational cost of the graph construction step and employs a principled probabilistic model for the visualization step, the objective of which can be effectively optimized through asynchronous stochastic gradient descent with a linear time complexity. The whole procedure thus easily scales to millions of high-dimensional data points. Experimental results on real-world data sets demonstrate that the LargeVis outperforms the state-of-the-art methods in both efficiency and effectiveness. The hyper-parameters of LargeVis are also much more stable over different data sets.

Comments:	WWW 2016
Subjects:	Machine Learning (cs.LG); Human-Computer Interaction (cs.HC)
Cite as:	arXiv:1602.00370 [cs.LG]
	(or arXiv:1602.00370v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1602.00370
Related DOI:	https://doi.org/10.1145/2872427.2883041

Submission history

From: Jian Tang [view email]
[v1] Mon, 1 Feb 2016 03:01:33 UTC (66,554 KB)
[v2] Tue, 5 Apr 2016 03:59:57 UTC (66,554 KB)

Computer Science > Machine Learning

Title:Visualizing Large-scale and High-dimensional Data

Submission history

Access Paper:

References & Citations

1 blog link

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Visualizing Large-scale and High-dimensional Data

Submission history

Access Paper:

References & Citations

1 blog link

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators