AutoTM 2.0: Automatic Topic Modeling Framework for Documents Analysis

Khodorchenko, Maria; Butakov, Nikolay; Zuev, Maxim; Nasonov, Denis

Computer Science > Machine Learning

arXiv:2410.00655 (cs)

[Submitted on 1 Oct 2024]

Title:AutoTM 2.0: Automatic Topic Modeling Framework for Documents Analysis

Authors:Maria Khodorchenko, Nikolay Butakov, Maxim Zuev, Denis Nasonov

View PDF HTML (experimental)

Abstract:In this work, we present an AutoTM 2.0 framework for optimizing additively regularized topic models. Comparing to the previous version, this version includes such valuable improvements as novel optimization pipeline, LLM-based quality metrics and distributed mode.
AutoTM 2.0 is a comfort tool for specialists as well as non-specialists to work with text documents to conduct exploratory data analysis or to perform clustering task on interpretable set of features. Quality evaluation is based on specially developed metrics such as coherence and gpt-4-based approaches. Researchers and practitioners can easily integrate new optimization algorithms and adapt novel metrics to enhance modeling quality and extend their experiments.
We show that AutoTM 2.0 achieves better performance compared to the previous AutoTM by providing results on 5 datasets with different features and in two different languages.

Subjects:	Machine Learning (cs.LG); Computation and Language (cs.CL)
Cite as:	arXiv:2410.00655 [cs.LG]
	(or arXiv:2410.00655v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2410.00655

Submission history

From: Maria Khodorchenko [view email]
[v1] Tue, 1 Oct 2024 13:13:15 UTC (4,953 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.LG

< prev | next >

new | recent | 2024-10

Change to browse by:

cs
cs.CL

References & Citations

export BibTeX citation

Computer Science > Machine Learning

Title:AutoTM 2.0: Automatic Topic Modeling Framework for Documents Analysis

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:AutoTM 2.0: Automatic Topic Modeling Framework for Documents Analysis

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators