Second-order inertial algorithms for very large-scale optimization
Speaker: Camille Castera (postdoc at the University of Tübingen)
Title: Second-order inertial algorithms for very large-scale optimization
Abstract: Non-convex non-smooth optimization has gained a lot of interest due to the efficiency of neural networks in many practical applications and the need to "train" them. Training amounts to solving very large-scale optimization problems. In this context, standard algorithms almost exclusively rely on inexact (sub-)gradients through automatic differentiation and mini-batch sub-sampling. As a result, first-order methods (SGD, ADAM, etc.) remain the most used ones to train neural networks.
Driven by a dynamical system approach, we build INNA, an inertial and Newtonian algorithm, exploiting second-order information on the function only by means of first-order automatic differentiation and mini-batch sub-sampling. By analyzing together the dynamical system and INNA, we prove the almost-sure convergence of the algorithm to the critical points of the objective function. We also show that despite its second-order nature, INNA is likely to avoid strict-saddle points (formally, the limit is a local minimum with overwhelming probability).
Practical considerations will be discussed, and some deep learning experiments will be presented.
Finally, we provide insights into a recent work that paves the way for the design of faster second-order methods.
This includes joint work with J. Bolte, C. Févotte, E. Pauwels, H. Attouch, J. Fadili, and P. Ochs.
More information: https://camcastera.github.io/
Talk in room M7 101 (Campus Monod, ENS de Lyon)