Session II.2 - Continuous Optimization
Poster
Variance-Reduction for SGD with Polyak Stepsizes and Line-Search
Xiaowen Jiang
CISPA Helmholtz Center for Information Security, Germany - This email address is being protected from spambots. You need JavaScript enabled to view it.
The recently proposed stochastic Polyak stepsize (SPS) and stochastic line-search (SLS) for SGD have shown remarkable effectiveness when training over-parameterized models. However, in non-interpolation settings, both algorithms only guarantee convergence to a neighborhood of a solution which may result in worse output than the initial guess. While artificially decreasing the adaptive stepsize has been proposed to address this issue (Orvieto et al. [2022]), this approach results in slower convergence rates for convex and over-parameterized models.
In this work, we make two contributions: Firstly, we propose two new variants of SPS and SLS, called AdaSPS and AdaSLS, which guarantee convergence in non-interpolation settings and maintain sub-linear and linear convergence rates for convex and strongly convex functions when training over-parameterized models. These algorithms require no knowledge of problem-dependent parameters and do not assume bounded iterates under strong convexity assumption. Secondly, we equip AdaSPS and AdaSLS with a novel variance reduction technique and obtain algorithms that require $\widetilde{\mathcal{O}}(n+1/\epsilon)$ gradient evaluations to achieve an $\mathcal{O}(\epsilon)$-suboptimality for convex functions, which improves upon the slower $\mathcal{O}(1/\epsilon^2)$ rates of AdaSPS and AdaSLS without variance reduction in the non-interpolation regimes. Additionally, our result matches the fast rates of AdaSVRG and SARAH.
Joint work with Sebastian U. Stich (CISPA Helmholtz Center for Information Security).