Saddle-to-Saddle Dynamics in Diagonal Linear Networks
Scott Pesme
Title: Saddle-to-Saddle Dynamics in Diagonal Linear Networks
Abstract: When training neural networks with gradient methods using a small initialisation of the weights, strange types of learning curves appear: the training process makes very little progress for some time, followed by a sharp transition where a new “feature” is suddenly learnt. This behaviour is usually referred to as incremental learning. In this talk, I will show that we can fully describe this phenomenon when considering a toy network architecture. In this simplified setting, we can prove that the gradient flow trajectory jumps from a saddle of the training loss to another. Each visited saddle as well as the jump times can be computed through a recursive algorithm reminiscent of the Homotopy algorithm used for finding the Lasso path.
Website: https://scottpesme.github.io/