SentenceVAE: Enable Next-sentence Prediction for Large Language Models with Faster Speed, Higher Accuracy and Longer Context

An, Hongjun; Chen, Yifan; Sun, Zhe; Li, Xuelong

Computer Science > Artificial Intelligence

arXiv:2408.00655 (cs)

[Submitted on 1 Aug 2024 (v1), last revised 14 Aug 2024 (this version, v5)]

Title:SentenceVAE: Enable Next-sentence Prediction for Large Language Models with Faster Speed, Higher Accuracy and Longer Context

Authors:Hongjun An, Yifan Chen, Zhe Sun, Xuelong Li

View PDF HTML (experimental)

Abstract:Current large language models (LLMs) primarily utilize next-token prediction method for inference, which significantly impedes their processing speed. In this paper, we introduce a novel inference methodology termed next-sentence prediction, aiming at enhancing the inference efficiency of LLMs. We present Sentence Variational Autoencoder (SentenceVAE), which includes a Sentence Encoder to compress multiple tokens in a sentence into a single token, and a Sentence Decoder to reconstruct it. By integrating SentenceVAE into the input and output layers of LLMs, we develop Sentence-level LLMs (SLLMs) that employ a sentence-by-sentence inference method. In addition, the SentenceVAE module of SLLMs can maintain the integrity of the original semantic content by segmenting the context into sentences, thereby improving accuracy while boosting inference speed. Moreover, compared to previous LLMs, SLLMs process fewer tokens over equivalent context length, significantly reducing memory demands for self-attention computation and facilitating the handling of longer context. Extensive experiments on Wanjuan dataset have revealed that the proposed method can accelerate inference speed by 204~365%, reduce perplexity (PPL) to 46~75% of its original metric, and decrease memory overhead by 86~91% for the equivalent context length, compared to previous token-by-token methods.

Comments:	update the article
Subjects:	Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Cite as:	arXiv:2408.00655 [cs.AI]
	(or arXiv:2408.00655v5 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.2408.00655

Submission history

From: Hongjun An [view email]
[v1] Thu, 1 Aug 2024 15:45:19 UTC (1,336 KB)
[v2] Fri, 2 Aug 2024 08:27:08 UTC (1,160 KB)
[v3] Tue, 6 Aug 2024 13:38:50 UTC (1,160 KB)
[v4] Wed, 7 Aug 2024 12:23:14 UTC (1,160 KB)
[v5] Wed, 14 Aug 2024 07:34:44 UTC (1,153 KB)

Computer Science > Artificial Intelligence

Title:SentenceVAE: Enable Next-sentence Prediction for Large Language Models with Faster Speed, Higher Accuracy and Longer Context

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Artificial Intelligence

Title:SentenceVAE: Enable Next-sentence Prediction for Large Language Models with Faster Speed, Higher Accuracy and Longer Context

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators