Preconditioning for Accelerated Gradient Descent Optimization and Regularization

Ye, Qiang

Computer Science > Machine Learning

arXiv:2410.00232 (cs)

[Submitted on 30 Sep 2024]

Title:Preconditioning for Accelerated Gradient Descent Optimization and Regularization

Authors:Qiang Ye

View PDF HTML (experimental)

Abstract:Accelerated training algorithms, such as adaptive learning rates and various normalization methods, are widely used but not fully understood. When regularization is introduced, standard optimizers like adaptive learning rates may not perform effectively. This raises the need for alternative regularization approaches and the question of how to properly combine regularization with preconditioning. In this paper, we address these challenges using the theory of preconditioning as follows: (1) We explain how preconditioning with AdaGrad, RMSProp, and Adam accelerates training; (2) We explore the interaction between regularization and preconditioning, outlining different options for selecting the variables for regularization, and in particular we discuss how to implement that for the gradient regularization; and (3) We demonstrate how normalization methods accelerate training by improving Hessian conditioning, and discuss how this perspective can lead to new preconditioning training algorithms. Our findings offer a unified mathematical framework for understanding various acceleration techniques and deriving appropriate regularization schemes.

Comments:	7 pages
Subjects:	Machine Learning (cs.LG); Numerical Analysis (math.NA); Machine Learning (stat.ML)
Cite as:	arXiv:2410.00232 [cs.LG]
	(or arXiv:2410.00232v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2410.00232

Submission history

From: Qiang Ye [view email]
[v1] Mon, 30 Sep 2024 20:58:39 UTC (20 KB)

Computer Science > Machine Learning

Title:Preconditioning for Accelerated Gradient Descent Optimization and Regularization

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Preconditioning for Accelerated Gradient Descent Optimization and Regularization

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators