Second-order step-size tuning of SGD for non-convex optimization - Signaux et Images Accéder directement au contenu
Pré-Publication, Document De Travail Année : 2021

Second-order step-size tuning of SGD for non-convex optimization

Résumé

In view of a direct and simple improvement of vanilla SGD, this paper presents a fine-tuning of its step-sizes in the mini-batch case. For doing so, one estimates curvature, based on a local quadratic model and using only noisy gradient approximations. One obtains a new stochastic first-order method (Step-Tuned SGD) which can be seen as a stochastic version of the classical Barzilai-Borwein method. Our theoretical results ensure almost sure convergence to the critical set and we provide convergence rates. Experiments on deep residual network training illustrate the favorable properties of our approach. For such networks we observe, during training, both a sudden drop of the loss and an improvement of test accuracy at medium stages, yielding better results than SGD, RMSprop, or ADAM.
Fichier principal
Vignette du fichier
2103.03570.pdf (1.54 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03161775 , version 1 (08-03-2021)
hal-03161775 , version 2 (23-11-2021)

Identifiants

Citer

Camille Castera, Cédric Févotte, Jérôme Bolte, Edouard Pauwels. Second-order step-size tuning of SGD for non-convex optimization. 2021. ⟨hal-03161775v1⟩
243 Consultations
271 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More