Andrea Montanari
A Mean Field View of the Landscape of Two-Layers Neural Networks
Abstract: Multi-layer neural networks are among the most powerful
models in machine learning, yet the fundamental reasons for this
success defy mathematical understanding. Learning a neural network
requires to optimize a non-convex high-dimensional objective (risk
function), a problem which is usually attacked using stochastic
gradient descent (SGD). In this paper we consider a simple case,
namely two-layers neural networks, and prove that --in a suitable
scaling limit-- SGD dynamics is captured by a certain non-linear
partial differential equation (PDE) that we call `distributional
dynamics'. Among other consequences, this result implies that SGD
complexity is independent of the number of hidden units.
[Joint work with Song Mei and Pan-Minh Nguyen]