Attention layers provably solve single-location regression

Marion, Pierre; Berthier, Raphaël; Biau, Gérard; Boyer, Claire

Statistics > Machine Learning

arXiv:2410.01537 (stat)

[Submitted on 2 Oct 2024]

Title:Attention layers provably solve single-location regression

Authors:Pierre Marion, Raphaël Berthier, Gérard Biau, Claire Boyer

View PDF HTML (experimental)

Abstract:Attention-based models, such as Transformer, excel across various tasks but lack a comprehensive theoretical understanding, especially regarding token-wise sparsity and internal linear representations. To address this gap, we introduce the single-location regression task, where only one token in a sequence determines the output, and its position is a latent random variable, retrievable via a linear projection of the input. To solve this task, we propose a dedicated predictor, which turns out to be a simplified version of a non-linear self-attention layer. We study its theoretical properties, by showing its asymptotic Bayes optimality and analyzing its training dynamics. In particular, despite the non-convex nature of the problem, the predictor effectively learns the underlying structure. This work highlights the capacity of attention mechanisms to handle sparse token information and internal linear structures.

Comments:	41 pages, 7 figures
Subjects:	Machine Learning (stat.ML); Machine Learning (cs.LG)
Cite as:	arXiv:2410.01537 [stat.ML]
	(or arXiv:2410.01537v1 [stat.ML] for this version)
	https://doi.org/10.48550/arXiv.2410.01537

Submission history

From: Pierre Marion [view email]
[v1] Wed, 2 Oct 2024 13:28:02 UTC (1,501 KB)

Full-text links:

Access Paper:

view license

Current browse context:

stat.ML

< prev | next >

new | recent | 2024-10

Change to browse by:

cs
cs.LG
stat

References & Citations

export BibTeX citation

Statistics > Machine Learning

Title:Attention layers provably solve single-location regression

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Machine Learning

Title:Attention layers provably solve single-location regression

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators