softmax is not enough (for sharp out-of-distribution)

Veličković, Petar; Perivolaropoulos, Christos; Barbero, Federico; Pascanu, Razvan

Computer Science > Machine Learning

arXiv:2410.01104 (cs)

[Submitted on 1 Oct 2024 (v1), last revised 7 Oct 2024 (this version, v2)]

Title:softmax is not enough (for sharp out-of-distribution)

Authors:Petar Veličković, Christos Perivolaropoulos, Federico Barbero, Razvan Pascanu

View PDF HTML (experimental)

Abstract:A key property of reasoning systems is the ability to make sharp decisions on their input data. For contemporary AI systems, a key carrier of sharp behaviour is the softmax function, with its capability to perform differentiable query-key lookups. It is a common belief that the predictive power of networks leveraging softmax arises from "circuits" which sharply perform certain kinds of computations consistently across many diverse inputs. However, for these circuits to be robust, they would need to generalise well to arbitrary valid inputs. In this paper, we dispel this myth: even for tasks as simple as finding the maximum key, any learned circuitry must disperse as the number of items grows at test time. We attribute this to a fundamental limitation of the softmax function to robustly approximate sharp functions, prove this phenomenon theoretically, and propose adaptive temperature as an ad-hoc technique for improving the sharpness of softmax at inference time.

Comments:	Comments welcome. 15 pages, 7 figures
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Information Theory (cs.IT)
Cite as:	arXiv:2410.01104 [cs.LG]
	(or arXiv:2410.01104v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2410.01104

Submission history

From: Petar Veličković [view email]
[v1] Tue, 1 Oct 2024 22:22:35 UTC (6,983 KB)
[v2] Mon, 7 Oct 2024 13:13:41 UTC (6,792 KB)

Computer Science > Machine Learning

Title:softmax is not enough (for sharp out-of-distribution)

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:softmax is not enough (for sharp out-of-distribution)

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators