View abstract

Session II.2 - Continuous Optimization

Friday, June 16, 17:00 ~ 17:30

Two Sides of One Coin: the Limits of Untuned SGD and the Power of Adaptive Methods

Niao He

ETH Zurich , Switzerland   -   This email address is being protected from spambots. You need JavaScript enabled to view it.

A central optimization challenge in machine learning is parameter-tuning. Adaptive gradient methods, such as AdaGrad and Adam, are ubiquitously used for training machine learning models in practice, owing to their ability to adjust the stepsizes without granular knowledge of the objective functions. Despite the empirical successes, their theoretical benefits over vanilla SGD remain elusive. In this talk, we will examine the convergence guarantees of a wide range of parameter-agnostic algorithms including untuned SGD, normalized SGD, AdaGrad, and others in the nonconvex setting assuming only smoothness and bounded variance. Our results will provide some hints on the provable advantage of adaptive methods.

View abstract PDF