Andrea Montanari A Mean Field View of the Landscape of Two-Layers Neural Networks Abstract: Multi-layer neural networks are among the most powerful models in machine learning, yet the fundamental reasons for this success defy mathematical understanding. Learning a neural network requires to optimize a non-convex high-dimensional objective (risk function), a problem which is usually attacked using stochastic gradient descent (SGD). In this paper we consider a simple case, namely two-layers neural networks, and prove that --in a suitable scaling limit-- SGD dynamics is captured by a certain non-linear partial differential equation (PDE) that we call `distributional dynamics'. Among other consequences, this result implies that SGD complexity is independent of the number of hidden units. [Joint work with Song Mei and Pan-Minh Nguyen]