Session abstracts

Session II.2 - Continuous Optimization


 

Talks


Thursday, June 15, 14:00 ~ 14:30

On the minimization of piecewise functions: pseudo stationarity

Jong-Shi Pang

University of Southern California, USA   -   This email address is being protected from spambots. You need JavaScript enabled to view it.

This paper discusses the minimization of a class of (possibly discontinuous) piecewise functions that has many applied sources. We define the concept of a pseudo stationary solution and present two approaches for its computation: an epigraphical approach and a noniffier approach; the latter is the nonsmooth generalization of the mollifier approach of Norkin and Wets that lies at the foundation of smooth approximation of discontinuous minimization.

Joint work with Ying Cui (University of Minnesota) and Junyi Liu (Tsinghua University).

View abstract PDF


Thursday, June 15, 14:30 ~ 15:00

Complexity Guarantees for Chance-Constrained Optimization by leveraging Minkowski Functionals

Uday Shanbhag

Pennsylvania State University, USA   -   This email address is being protected from spambots. You need JavaScript enabled to view it.

Chance-constrained optimization represents a challenging class of problems in stochastic programming. First studied by Charnes and Cooper in the 1950s, this class of problems has seen significant interest by researchers in both continuous and discrete optimization. Important recent results include the development of schemes that have combined smoothing, sampling, and variational analysis. Yet much much of these findings provide convergence guarantees to (appropriately defined) stationary points while complexity guarantees for computing approximate global minimizers have remained elusive. We consider the resolution of precisely such a gap for a subclass of such problems. We begin by considering the maximization of the probability $\mathbb{P}\left\{ \, \zeta \, \mid \, \zeta \, \in \, K(x) \, \right\}$ over a closed and convex set $\mathcal X$. When $K$ is defined via positively homogenous functions and $\zeta$ satisfies suitable distributional requirements, by leveraging properties of Minkowski functionals and recent findings in the context of non-Gaussian integrals of positively homogenous functions, the probability of interest $\mathbb{P}\left\{\,\zeta \, \mid \, \zeta \, \in \, K(x) \, \right\}$ can be expressed as the expectation of a Clarke-regular function $F(\bullet,\xi)$ with respect to an appropriately defined Gaussian density (or its variant). In fact, we may show that a suitably defined compositional representation is convex. A regularized variance-reduced stochastic approximation framework is provided for computing approximate global minimizers of the original problem and rate and complexity guarantees are provided. In the second part of the talk, we extend the results to a chance-constrained regimes and provide avenues for weakening the distributional assumptions. We show that the chance-constrained problem is equivalent to an optimization problem with compositional convex constraints. Complexity guarantees are provided for computing an approximate global minimizer of the original problem via an inexact variance-reduced proximal-point framework.

Joint work with Ibrahim Bardakci (Bartin University, Turkey), Afrooz Jalilzadeh (University of Arizona, USA), Constantino Lagoa (Penn. State, USA) and Peixuan Zhang (Penn. State, USA).

View abstract PDF


Thursday, June 15, 15:00 ~ 15:30

Adaptive Stochastic Algorithms for Nonlinearly Constrained Optimization

Frank E. Curtis

Lehigh University, USA   -   This email address is being protected from spambots. You need JavaScript enabled to view it.

I will discuss the interesting features that drive the convergence guarantees for a set of adaptive stochastic algorithms that my collaborators and I have proposed for solving nonlinearly constrained optimization problems. These algorithms are of the sequential quadratic optimization and interior-point varieties, and they operate in the fully stochastic regime in which we prove convergence-in-expectation and almost-sure-convergence guarantees.

Joint work with Albert S. Berahas (University of Michigan, USA), Xin Jiang (Lehigh University, USA), Michael J. O'Neill (University of North Carolina, USA), Daniel P. Robinson (Lehigh University, USA), Qi Wang (Lehigh University, USA) and Baoyu Zhou (University of Chicago, USA).

View abstract PDF


Thursday, June 15, 15:30 ~ 16:00

A Newton-MR Algorithm With Complexity Guarantees for Nonconvex Optimization

Fred Roosta

University of Queensland, Australia   -   This email address is being protected from spambots. You need JavaScript enabled to view it.

We consider variants of Newton-MR algorithm for solving unconstrained, smooth, but non-convex optimization problems. Unlike the overwhelming majority of Newton-type methods, which rely on conjugate gradient algorithm as the primary workhorse for their respective sub-problems, Newton-MR employs minimum residual (MINRES) method. Recently, it has been established that MINRES has inherent ability to detect non-positive curvature directions as soon as they arise and certain useful monotonicity properties will be satisfied before such detection. We leverage these recent results and show that our algorithms come with desirable properties including competitive first and second-order worst-case complexities. Numerical examples demonstrate the performance of our proposed algorithms.

Joint work with Yang Liu (Oxford University).

View abstract PDF


Thursday, June 15, 16:30 ~ 17:00

A Descent Algorithm for the Optimal Control of ReLU Neural Network Informed PDEs Based on Approximate Directional Derivatives

Michael Hintermüller

Weierstrass Institute Berlin, Germany   -   This email address is being protected from spambots. You need JavaScript enabled to view it.

We propose and analyze a numerical algorithm for solving a class of optimal control problems for learning-informed semilinear partial differential equations. The latter is a class of PDEs with constituents that are in principle unknown and are approximated by nonsmooth ReLU neural networks. We first show that a direct smoothing of the ReLU network with the aim to make use of classical numerical solvers can have certain disadvantages. This motivates us to devise a numerical algorithm that treats directly the nonsmooth optimal control problem, by employing a descent algorithm inspired by a bundle-free method. Several numerical examples are provided and the efficiency of the algorithm is shown.

Joint work with Guozhi Dong (Central South University, China) and Kostas Papafitsoros (Queen Mary University, London, UK).

View abstract PDF


Thursday, June 15, 17:00 ~ 17:30

Leveraging "partial" smoothness for faster convergence in nonsmooth optimization

Davis Damek

Cornell University, USA   -   This email address is being protected from spambots. You need JavaScript enabled to view it.

Nonsmoothness and nonconvexity are significant challenges in large-scale optimization problems, such as training neural networks and solving inverse problems in computational imaging. Although first-order methods are commonly used to address these problems, they are often considered slow. In this presentation, we introduce a (locally) accelerated first-order method that violates this perception by solving "generic" nonsmooth equations at a superlinear rate. The central insight is that nonsmooth functions often exhibit "partial" smoothness, which can be leveraged through a novel generalization of Newton's method.

Joint work with Vasileios Charisopoulos (Cornell University, University of Chicago).

View abstract PDF


Thursday, June 15, 17:30 ~ 18:30

Algorithms for large scale semidefinite programming

Kim-Chuan Toh

National University of Singapore, Singapore   -   This email address is being protected from spambots. You need JavaScript enabled to view it.

Semidefinite programming (SDP) and its generalizations have found extensive applications in various domains, including combinatorial and polynomial optimization, covariance matrix estimation, and Euclidean metric embedding. This presentation introduces some algorithms developed for solving large-scale SDP problems. Specifically, we consider two types of SDPs: Type-2 SDPs with moderate variable dimension but a large number of affine constraints, and Type-3 SDPs with large variable dimension but a moderate number of affine constraints. The first part of the talk focuses on Type-2 SDPs. We present algorithmic advancements based on the proximal-point or augmented Lagrangian framework. In particular, the development and implementation of an augmented Lagrangian-based method called SDPNAL+. This method demonstrates promising results in solving SDPs of this type efficiently. In the second part, we explore the design and implementation of a smoothing Newton method for solving Type-2 SDPs. This method leverages the KKT residual mapping and provides an effective approach to address the aforementioned class of SDPs under suitable nondegenercay conditions. Lastly, we delve into recent progress in designing and implementing a rank-adaptive feasible method for Type-3 SDPs. By employing the low-rank factorization model of the underlying SDP, we demonstrate the promising potential of the new method in solving large-scale SDP problems of this category.

Joint work with Ling Liang (National University of Singapore), Tianyun Tang (National University of Singapore), Defeng Sun (Hong Kong Polytechnic University) and Xinyuan Zhao (Beijing University of Technology).

View abstract PDF


Friday, June 16, 14:00 ~ 14:30

Subspace methods for nonconvex optimization

Coralia Cartis

University of Oxford, United Kingdom   -   This email address is being protected from spambots. You need JavaScript enabled to view it.

We discuss random and deterministic subspace methods for nonconvex optimization problems. We are interested in the global optimisation of functions with low effective dimensionality, that vary only along certain important directions or components. We show that the effective subspace of variation can be efficiently learned in advance of the optimization process; we contrast this with random embedding techniques that focus directly on optimization rather than learning. For local optimization, time permitting, we will also discuss efficient choices of subspaces that blend randomisation techniques with expert deterministic choices.

View abstract PDF


Friday, June 16, 14:30 ~ 15:00

Homogenization of SGD in High-dimensions

Courtney Paquette

McGill University/Google Brain, Canada   -   This email address is being protected from spambots. You need JavaScript enabled to view it.

We develop a stochastic differential equation, called homogenized SGD, for analyzing the dynamics of stochastic gradient descent (SGD) on a high-dimensional random generalized linear models. We show that homogenized SGD is the high-dimensional equivalence of SGD– for any $C^3$- statistic (e.g., population risk ), the statistic under the iterates of SGD converges to the statistic under homogenized SGD when the number of samples $n$ and number of features $d$ are polynomially related ($d^c \le n \le d^{1/c}$ for some $c \ge 0$). Several motivating applications are provided including phase retrieval, least-squares, and logistic regression.

Joint work with Elliot Paquette (McGill University), Inbar Seroussi (Tel-Aviv University) and Elizabeth Collins-Woodfin (McGill University.

View abstract PDF


Friday, June 16, 15:00 ~ 15:30

Non-convex optimization when the solution is not unique: A kaleidoscope of favorable conditions

Nicolas Boumal

EPFL, Switzerland   -   This email address is being protected from spambots. You need JavaScript enabled to view it.

Classical optimization algorithms can see their local convergence rates deteriorate when the Hessian at the optimum is singular. The latter is inescapable when the optima are non-isolated. Yet, several algorithms behave perfectly nicely even when optima form a continuum (e.g., due to overparameterization). This has been studied through various lenses, including the Polyak-Lojasiewicz condition, Quadratic Growth, the Error Bound, and (less so) through a Morse-Bott property. I will present old and new links between all four of these, and touch on implications for fast local convergence of classical algorithms.

Joint work with Quentin Rebjock (EPFL).

View abstract PDF


Friday, June 16, 15:30 ~ 16:00

Optimal Design for Policy Learning across Related Locations

Georgina Hall

INSEAD, France   -   This email address is being protected from spambots. You need JavaScript enabled to view it.

A firm wishes to learn the response of customers to different actions (e.g., discounts, treatments) so that it can assign, down the line, the action which maximizes the firm’s monetary gain to future customers. To learn this response, the firm can run experiments where it assigns actions to different participants and observes the outcomes. This information is then used to build a model of the customer’s response, which, in turn, is used for action assignment. In a budget-constrained setting where not all participants can be part of the experiment, one can wonder which participants should be selected to maximize the firm’s monetary gain – a question also known as optimal design. In this talk, we consider a twist on the classical question. We assume that the firm operates across many different locations and that customers’ responses are related to varying degrees across these locations. We propose a procedure, based on semidefinite programming, for selecting participants in each location, which leverages the connection between locations. In particular, we show that this procedure outperforms the setting where experiments are run in parallel in each location. This implies that the firm stands to gain from running experiments at a global level rather than locally.

Joint work with Stefanos Poulidis (INSEAD) and Spyros Zoumpoulis (INSEAD).

View abstract PDF


Friday, June 16, 16:30 ~ 17:00

Small Errors in Random Zeroth-Order Optimization are Imaginary

Daniel Kuhn

EPFL, Switzerland   -   This email address is being protected from spambots. You need JavaScript enabled to view it.

Most zeroth-order optimization algorithms mimic a first-order algorithm but replace the gradient of the objective function with some noisy gradient estimator that can be computed from a small number of function evaluations. This estimator is constructed randomly, and its expectation matches the gradient of a smooth approximation of the objective function whose quality improves as the underlying smoothing parameter $\delta$ is reduced. Gradient estimators requiring a smaller number of function evaluations are preferable from a computational point of view. While estimators based on a single function evaluation can be obtained by a clever use of the divergence theorem from vector calculus, their variance explodes as $\delta$ tends to~0. Estimators based on multiple function evaluations, on the other hand, suffer from numerical cancellation when $\delta$ tends to 0. To combat both effects simultaneously, we extend the objective function to the complex domain and construct a gradient estimator that evaluates the objective at a complex point whose coordinates have small imaginary parts of the order $\delta$. As this estimator requires only one function evaluation, it is immune to cancellation. In addition, its variance remains bounded as $\delta$ tends to 0. We prove that zeroth-order algorithms that use our estimator offer the same theoretical convergence guarantees as the state-of-the-art methods. Numerical experiments suggest, however, that they often converge faster in practice.

Joint work with Wouter Jongeneel (EPFL) and Man-Chung Yue (The University of Hong Kong).

View abstract PDF


Friday, June 16, 17:00 ~ 17:30

Two Sides of One Coin: the Limits of Untuned SGD and the Power of Adaptive Methods

Niao He

ETH Zurich , Switzerland   -   This email address is being protected from spambots. You need JavaScript enabled to view it.

A central optimization challenge in machine learning is parameter-tuning. Adaptive gradient methods, such as AdaGrad and Adam, are ubiquitously used for training machine learning models in practice, owing to their ability to adjust the stepsizes without granular knowledge of the objective functions. Despite the empirical successes, their theoretical benefits over vanilla SGD remain elusive. In this talk, we will examine the convergence guarantees of a wide range of parameter-agnostic algorithms including untuned SGD, normalized SGD, AdaGrad, and others in the nonconvex setting assuming only smoothness and bounded variance. Our results will provide some hints on the provable advantage of adaptive methods.

View abstract PDF


Friday, June 16, 17:30 ~ 18:00

Adaptive first-order methods for convex optimization

Yura Malitsky

University of Vienna, Austria   -   This email address is being protected from spambots. You need JavaScript enabled to view it.

In this talk, I will present a new way to make the gradient descent and proximal gradient method fully adaptive and at the same time without increasing their iteration cost. We don't need any additional assumptions and even relax the assumption of global Lipschitzness for the differentiable component. The stepsizes approximate the local curvature of the differentiable function and can increase from iteration to iteration. We will discuss some limitations and open problems.

Joint work with Konstantin Mishchenko.

View abstract PDF


Friday, June 16, 18:00 ~ 18:30

On the Complexity of a Simple Primal-Dual Coordinate Method

Ahmet Alacaoglu

University of Wisconsin-Madison, USA   -   This email address is being protected from spambots. You need JavaScript enabled to view it.

We focus on an algorithm called "primal-dual algorithm with random extrapolation and coordinate descent (PURE-CD)", which has been shown to obtain good practical performance for solving convex-concave min-max problems with bilinear coupling. Such problems arise in many machine learning contexts, including linear empirical risk minimization, matrix games, and image processing. We prove complexity bounds for PURE-CD that either match or improve the best-known complexities for dense and sparse (strongly)-convex-(strongly)-concave problems with bilinear coupling in the literature.

Joint work with Volkan Cevher (EPFL) and Stephen J. Wright (University of Wisconsin-Madison).

View abstract PDF


Saturday, June 17, 14:00 ~ 14:30

Bilevel hyperparameter optimization for nonlinear support vector machines

Alain Zemkoho

University of Southampton, United Kingdom   -   This email address is being protected from spambots. You need JavaScript enabled to view it.

It has now been well-established that the problem of computing optimal hyperparameters for a support vector classification problem is a bilevel optimization problem. However, so far, focus has essentially been on support vector machines with linear kernels. Unfortunately, linear kernels cannot capture more complex relationships between points of most practical data sets. In this talk, we will present a bilevel optimization model for computing hyperparameters in the context of a nonlinear support vector machine training problem. Unlike in the case with a linear kernel, the dual model of the training problem is needed to get a completely explicit problem in a finite dimensional space by means of the radial basis function. Subsequently, the Karush-Kuhn-Tucker (or mathematical program with equilibrium constraints) reformulation of the problem is introduced and the corresponding theoretical properties studied. In particular, we focus our attention on the fulfilment of the corresponding classical qualification conditions such as the Mangasarian–Fromovitz constraint qualification, linear independence constraint qualification, and strong second order sufficient condition. Some numerical illustrations will also be provided.

Joint work with Stefano Coniglio (University of Bergamo, Italy), Anthony Dunn (Decision Analysis Services Ltd, United Kingdom) and Qing-Na Li (Beijing Institute of Technology, China).

View abstract PDF


Saturday, June 17, 14:30 ~ 15:00

Some applications of implicit function theorems from variational analysis

Tim Hoheisel

McGill University, Canada   -   This email address is being protected from spambots. You need JavaScript enabled to view it.

We present some applications of implicit function theorems from variational analysis to sensitivity analysis of prominent optimization problems, including the LASSO, square root LASSO or the proximal operator as a function of the proximal parameter. These are leveraged in numerical computations.

Joint work with Aaron Berk (McGill), Simone Brugiapaglia (Concordia), Michael P. Friedlander (UBC), Ariel Goodwin (Cornell).

View abstract PDF


Saturday, June 17, 15:00 ~ 15:30

Adaptive regularization without function values

Serge Gratton

University of Toulouse, IRIT, INP, France   -   This email address is being protected from spambots. You need JavaScript enabled to view it.

An adaptive regularization algorithm for unconstrained nonconvex optimization is presented in which the objective function is never evaluated, but only derivatives are used. This algorithm belongs to the class of adaptive regularization methods, for which optimal worst-case complexity results are known for the standard framework where the objective function is evaluated. It is shown in this paper that these excellent complexity bounds are also valid for the new algorithm, despite the fact that significantly less information is used. In particular, it is shown that, if derivatives of degree one to $p$ are used, the algorithm will find an $\epsilon_1$-approximate first-order minimizer in at most $\calO(\epsilon_1^{-(p+1)/p})$ iterations, and an $(\epsilon_1,\epsilon_2)$-approximate second-order minimizer in at most $\calO(\max(\epsilon_1^{-(p+1)/p},\epsilon_2^{-(p+1)/(p-1)}))$ iterations. As a special case, the new algorithm using first and second derivatives, when applied to functions with Lipschitz continuous Hessian, will find an iterate $x_k$ at which the gradient's norm is less than $\epsilon_1$ in at most $\calO(\epsilon_1^{-3/2})$ iterations.

Joint work with Sadok Jerad (University of Toulouse) and Philippe L. Toint (University of Namur).

View abstract PDF


Saturday, June 17, 15:30 ~ 16:00

A Newton-type method for strict saddle functions

Clément Royer

Université Paris Dauphine-PSL, France   -   This email address is being protected from spambots. You need JavaScript enabled to view it.

Nonconvex optimization problems arise in many fields of computational mathematics and data science. Recent advances in this area have focused on instances where nonconvexity has a benign effect, in the sense that traditional optimization algorithms exhibit good theoretical and practical performance on such problems. On the other hand, exploiting a particular nonconvex structure to improve algorithmic design is a more challenging endeavor.

In this talk, we describe a Newton-type framework for solving nonconvex optimization problems that exhibit a particular structure characterized by a strict saddle property. Our approach benefits from the local convergence properties of Newton's method, and is endowed with global convergence rates that improve over those known for generic nonconvex instances. In addition, our algorithm and its analysis can account for the presence of manifold constraints, which we illustrate on a selection of strict saddle functions.

Joint work with Florentin Goyens (Université Paris Dauphine-PSL, France).

View abstract PDF


Saturday, June 17, 16:30 ~ 17:00

Nonsmooth optimization and augmented Lagrangian methods

Patrick Mehlitz

BTU Cottbus-Senftenberg, Germany   -   This email address is being protected from spambots. You need JavaScript enabled to view it.

In this talk, we demonstrate how the framework of (safeguarded) augmented Lagrangian methods can be used to solve nonsmooth optimization problems in abstract spaces. Two different settings are considered. First, we investigate a fully nonsmooth situation where the involved subproblems can be solved up to approximate global minimality, which is particularly interesting in convex situations, and present numerical examples from image denoising and sparse control. Second, we focus on problems where the nonsmoothness is encapsulated in an objective function of composite type, provide a convergence analysis for the situation where the subproblems are solved up to approximate stationarity, and illustrate our findings by means of numerical examples in the context of low-rank matrix optimization.

Joint work with Alberto De Marchi (University of the Bundeswehr Munich, Germany), Christian Kanzow (University of Würzburg, Germany), Alexander Y. Kruger (Federation University, Australia), Gerd Wachsmuth (BTU Cottbus-Senftenberg, Germany) and Frank Werner (University of Würzburg, Germany).

View abstract PDF


Saturday, June 17, 17:00 ~ 17:30

How do exponential size solutions arise in semidefinite programming?

Gabor Pataki

University of North Carolina at Chapel Hill, USA   -   This email address is being protected from spambots. You need JavaScript enabled to view it.

A striking pathology of semidefinite programs (SDPs) is illustrated by a classical example of Khachiyan: feasible solutions in SDPs may need exponential space even to write down. Such exponential size solutions are the main obstacle to solve a long standing, fundamental open problem: can we decide feasibility of SDPs in polynomial time?

The consensus seems that SDPs with large size solutions are rare. However, here we prove that they are actually quite common: a linear change of variables transforms every strictly feasible SDP into a Khachiyan type SDP, in which the leading variables are large. As to ``how large", that depends on the singularity degree of a dual problem. Further, we present some SDPs in which large solutions appear naturally, without any change of variables. We also partially answer the question: how do we represent such large solutions in polynomial space?

Joint work with Alex Touzov (UNC Chapel Hill).

View abstract PDF


Saturday, June 17, 17:30 ~ 18:00

Line-Search for Piecewise Smooth Functions

Raphael Hauser

University of Oxford, United Kingdom   -   This email address is being protected from spambots. You need JavaScript enabled to view it.

Industrial optimization problems often involve piecewise smooth objectives that are composed from individual smooth model functions defined on patches that subdivide the design space. The non-smoothness may not necessarily be a fundamental feature of the underlying physical model and only due to its numerical representation in a low dimensional space, but it can cause standard line search methods to stall. In this talk we present a line search method adapted to such problems that makes explicit use of the piecewise smooth nature of the model function. The new method compares favorably with existing approaches, both in theory and practice.

Joint work with Jonathan Grant-Peters (University of Oxford, UK).

View abstract PDF


 

Posters


Quadratic minimization: from conjugate gradient to an adaptive Heavy-ball method with Polyak step-sizes

Goujaud Baptiste

École Polytechnique, Institut Polytechnique de Paris, France   -   This email address is being protected from spambots. You need JavaScript enabled to view it.

In this work, we propose an adaptive variation on the classical Heavy-ball method for convex quadratic minimization. The adaptivity crucially relies on so-called "Polyak step-sizes", which consist in using the knowledge of the optimal value of the optimization problem at hand instead of problem parameters such as a few eigenvalues of the Hessian of the problem. This method happens to also be equivalent to a variation of the classical conjugate gradient method, and thereby inherits many of its attractive features, including its finite-time convergence, instance optimality, and its worst-case convergence rates. The classical gradient method with Polyak step-sizes is known to behave very well in situations in which it can be used, and the question of whether incorporating momentum in this method is possible and can improve the method itself appeared to be open. We provide a definitive answer to this question for minimizing convex quadratic functions, an arguably necessary first step for developing such methods in more general setups.

Joint work with Adrien Taylor (Inria Paris), Aymeric Dieuleveut (École Polytechnique, Institut Polytechnique de Paris).

View abstract PDF


Hybrid methods for global optimization of large-scale polynomial problems

Gilles Bareilles

Czech Technical University, Czech Republic   -   This email address is being protected from spambots. You need JavaScript enabled to view it.

The Moment/SOS hierarchy provides a way to compute the global minimizers of polynomial problems, at the cost of solving a sequence of increasingly large semidefinite programs. We consider large-scale problems, for which interior point methods are no longer applicable to the SDPs.

We propose an algorithm that combines a first-order method on the SDP relaxation, and a second-order method on the polynomial problem. Interestingly, the switch between first and second-order method is based on a quantitative criterion, which ensures Newton's method converges quadratically from its first iteration. We apply this methodology to obtain global minimizers on optimal power flow problems of nation-wide scale.

Joint work with Johannes Aspman (Czech Technical University), Vyacheslav Kungurtsev (Czech Technical University), Jakub Marecek (Czech Technical University) and Martin Takac (Lehigh University, MBZUAI).

View abstract PDF


Novel algorithms for nonconvex second-order optimization with strong performance guarantees

Chuan He

University of Minnesota, United States   -   This email address is being protected from spambots. You need JavaScript enabled to view it.

Second-order optimization has recently experienced significant developments, leading to numerous fruitful applications in science and engineering. In particular, recent research has shown that a second-order stationary point (SOSP) of a nonconvex optimization problem is often a globally optimal solution for instances that arise in areas such as machine learning and statistics. Therefore, developing efficient algorithms for computing such points is pivotal for advancing those areas. Our research introduces new algorithms with substantial theoretical improvements for solving two types of nonconvex constrained optimization problems and conducts numerical studies to show the practical advantages of the proposed methods over the state-of-the-art methods. In our first work, we develop a novel augmented Lagrangian (AL) method for finding an SOSP of a nonconvex equality constrained optimization problem. Theoretically, the proposed AL method improves upon the best-known complexity guarantees. Numerically, the computational speed of our AL method is vastly faster than the competing ones for solving a classical statistical problem. In our second work, we design efficient algorithms to find an SOSP of a general conic constrained nonconvex optimization problem. For the first time, we introduce a notion of an SOSP for general conic constrained optimization and propose an efficient algorithm to find it. Numerical results demonstrate the significant potential superiority of our barrier-AL method over the state-of-the-art method in terms of solution quality.

Joint work with Zhaosong Lu (University of Minnesota, United States).

View abstract PDF


A second order system attached to a monotone inclusion problem

David Alexander Hulett

University of Vienna, Austria   -   This email address is being protected from spambots. You need JavaScript enabled to view it.

In the setting of a real Hilbert space, we investigate the asymptotic properties of the trajectories generated by a second order dynamical system. As the time variable approaches infinity, a fast rate of convergence of order $\mathcal{O}\left(\frac{1}{t^{\tau}\beta(t)}\right)$ is exhibited by $\|V(z(t))\|$, where $z(t)$ denotes the generated trajectory, $\tau$ is a nonnegative number and $\beta(t)$ is a nondecreasing function which fulfills a growth condition. At least in one case, we are able to show the weak convergence of $z(t)$ to a zero of $V$.

Our approach combines features of two systems already present in the literature. On the one hand, by combining a vanishing damping term with the time derivative of $V$ along the trajectory, it bears resemblance with the fast OGDA system (Bot, Csetnek & Nguyen 2022). At the same time, by introducing two parameters $r$ and $s$ in $[0, 1]$, our system admits, through a particular choice for $V$, similar dynamics to those developed for a linear constrained convex optimization problem in (He, Hu & Fang 2022).

Joint work with Radu Ioan Bot (University of Vienna) and Dang-Khoa Nguyen (University of Vienna).

View abstract PDF


Semidefinite games

Constantin Ickstadt

Goethe Universität Frankfurt am Main, Germany   -   This email address is being protected from spambots. You need JavaScript enabled to view it.

We introduce and study the class of semidefinite games, which generalizes bimatrix games and finite $N$-person games, by replacing the simplex of the mixed strategies for each player by a slice of the positive semidefinite cone in the space of real symmetric matrices.

For semidefinite two-player zero-sum games, we show that the optimal strategies can be computed by semidefinite programming. Furthermore, we show that two-player semidefinite zero-sum games are almost equivalent to semidefinite programming, generalizing Dantzig's result on the almost equivalence of bimatrix games and linear programming.

For general two-player semidefinite games, we prove a spectrahedral characterization of the Nash equilibria. Moreover, we give constructions of semidefinite games with many Nash equilibria. In particular, we give a construction of semidefinite games whose number of connected components of Nash equilibria exceeds the long standing best known construction for many Nash equilibria in bimatrix games, which was presented by von Stengel in 1999.

Joint work with Thorsten Theobald (Goethe Universität Frankfurt am Main, Germany) and Elias Tsigaridas (Sorbonne Université, Paris University, CNRS, and Inria Paris, France).

View abstract PDF


Variance-Reduction for SGD with Polyak Stepsizes and Line-Search

Xiaowen Jiang

CISPA Helmholtz Center for Information Security, Germany   -   This email address is being protected from spambots. You need JavaScript enabled to view it.

The recently proposed stochastic Polyak stepsize (SPS) and stochastic line-search (SLS) for SGD have shown remarkable effectiveness when training over-parameterized models. However, in non-interpolation settings, both algorithms only guarantee convergence to a neighborhood of a solution which may result in worse output than the initial guess. While artificially decreasing the adaptive stepsize has been proposed to address this issue (Orvieto et al. [2022]), this approach results in slower convergence rates for convex and over-parameterized models.

In this work, we make two contributions: Firstly, we propose two new variants of SPS and SLS, called AdaSPS and AdaSLS, which guarantee convergence in non-interpolation settings and maintain sub-linear and linear convergence rates for convex and strongly convex functions when training over-parameterized models. These algorithms require no knowledge of problem-dependent parameters and do not assume bounded iterates under strong convexity assumption. Secondly, we equip AdaSPS and AdaSLS with a novel variance reduction technique and obtain algorithms that require $\widetilde{\mathcal{O}}(n+1/\epsilon)$ gradient evaluations to achieve an $\mathcal{O}(\epsilon)$-suboptimality for convex functions, which improves upon the slower $\mathcal{O}(1/\epsilon^2)$ rates of AdaSPS and AdaSLS without variance reduction in the non-interpolation regimes. Additionally, our result matches the fast rates of AdaSVRG and SARAH.

Joint work with Sebastian U. Stich (CISPA Helmholtz Center for Information Security).

View abstract PDF


Accelerated Adaptive Cubic Regularized Quasi-Newton Methods

Dmitry Kamzolov

Mohamed bin Zayed University of Artificial Intelligence(MBZUAI), UAE, United Arab Emirates (UAE)   -   This email address is being protected from spambots. You need JavaScript enabled to view it.

In this paper, we propose Cubic Regularized Quasi-Newton Methods for (strongly) star-convex and Accelerated Cubic Regularized Quasi-Newton for convex optimization. The proposed class of algorithms combines the global convergence of the Cubic Newton with the affordability of constructing the Hessian approximation via a Quasi-Newton update. Cubic Regularized Quasi-Newton Method is the first Quasi-Newton Method with the global convergence rate $O\left(\frac{LR^2}{T}\right)$. Accelerated Cubic Regularized Quasi-Newton is the first Quasi-Newton Method with the global convergence rate $O\left(\frac{LR^2}{T^2}\right)$. To construct these methods, we introduce two novel concepts of Hessian inexactness and propose the Adaptive Inexact Cubic Regularized Newton algorithm and its accelerated version, which adjust to the approximation error. We show that different Quasi-Newton updates satisfy these approximation conditions and provide a complexity-efficient way to solve the cubic subproblem. We provide global convergence rates with explicit dependence on Hessian approximation error. By combining cubic regularization, acceleration technique, and adaptivity, our algorithms show good practical performance.

Full paper is available: https://arxiv.org/pdf/2302.04987.pdf

Joint work with Klea Ziu (Mohamed bin Zayed University of Artificial Intelligence(MBZUAI), UAE), Artem Agafonov (Mohamed bin Zayed University of Artificial Intelligence(MBZUAI), UAE) and Martin Takáč (Mohamed bin Zayed University of Artificial Intelligence(MBZUAI), UAE).

View abstract PDF


An SDE perspective on stochastic convex optimization

Rodrigo Maulen

ENSICAEN, France   -   This email address is being protected from spambots. You need JavaScript enabled to view it.

In this paper, we analyze the global and local behavior of gradient-like flows under stochastic errors towards the aim of solving convex optimization problems with noisy gradient input. We first study the unconstrained differentiable convex case, using a stochastic differential equation where the drift term is minus the gradient of the objective function and the diffusion term is either bounded or square-integrable. In this context, under Lipschitz continuity of the gradient, our first main result shows almost sure convergence of the objective and the trajectory process towards a minimizer of the objective function. We also provide a comprehensive complexity analysis by establishing several new pointwise and ergodic convergence rates in expectation for the convex, strongly convex, and (local) Polyak-Lojasiewicz case. The latter, which involves local analysis, is challenging and requires non-trivial arguments from measure theory.

Joint work with Jalal Fadili (ENSICAEN, France) and Hedy Attouch (Université de Montpellier, France).

View abstract PDF


The rank-relaxed optimization landscape for orthogonal group synchronization on a general graph

Andrew McRae

EPFL, Switzerland   -   This email address is being protected from spambots. You need JavaScript enabled to view it.

To overcome nonconvexity in quadratically constrained quadratic programs (QCQPs), it is common to relax. Typical SDP relaxations square the dimension of the problem. Burer-Monteiro relaxations give more control over the dimension increase, but general guarantees typically still require a superlinear increase in dimension. Yet, for some problems, when the data is sufficiently structured, it is observed that merely doubling or tripling the dimension is sufficient to dissolve all bad local minima. We show this formally for special settings in synchronization of rotations, with connections to simultaneous localization and mapping (SLAM) in robotics and to synchronizing oscillators in the Kuramoto model.

Specifically, we consider the estimation of $r \times r$ orthogonal matrices $Z_1, \dots, Z_n$ from relative measurements of the form $R_{ij} \approx Z_i Z_j^T$ for $(i,j)$ corresponding to the edges of a connected graph $G$. The least-squares estimator can be expressed as a QCQP $\max_{Y \in \mathbf{R}^{nr \times r}}~\langle C, Y Y^T \rangle$ subject to $Y_i Y_i^T = I_r$ for each $r$-row block of $Y$. For $p \geq r$, the rank-$p$ relaxation of this problem allows $Y \in \mathbf{R}^{nr \times p}$, which makes the above problem an optimization program over a product of $n$ Stiefel manifolds. We show that, in the noiseless case, the rank-$p$ relaxation has no spurious local minima and yields an exact solution to the original rank-$r$ problem whenever $p \geq r+2$. This is optimal for general graphs; the best prior results require $2p \geq 3(r + 1)$. Furthermore, we prove that the absence of spurious local minima is robust to measurement error, where the amount of error tolerated depends on the relaxation rank $p$ and the spectral properties of $G$.

Joint work with Nicolas Boumal (EPFL, Switzerland).

View abstract PDF


Consensus-Based Optimization: On Algorithms, Analysis, and Applications

Konstantin Riedl

Technical University of Munich, Munich Center for Machine Learning, Germany   -   This email address is being protected from spambots. You need JavaScript enabled to view it.

Consensus-based optimization (CBO) is a multi-particle metaheuristic derivative-free optimization method capable of globally minimizing high-dimensional nonconvex and nonsmooth functions, a task which is ubiquitous in applied mathematics. The method is inspired by consensus dynamics and opinion formation, and is flexible enough to incorporate different mechanisms widely used in evolutionary computation and machine learning. In this poster we shed a light on the internal working principles of CBO, which are responsible for the success of the method. Based on an experimentally supported intuition that, on average, CBO performs a gradient descent of the squared Euclidean distance to the global minimizer, we devise a novel technique for proving the convergence to the global minimizer in mean-field law for a rich class of objective functions. In particular, we prove that CBO performs a convexification of a very large class of optimization problems as the number of optimizing particles tends to infinity. Our results hold under minimal assumptions about the initialization of the method and for objectives that are merely locally Lipschitz continuous. From this result it becomes apparent that the hardness of any global optimization problem is necessarily encoded in the mean-field approximation, or, more precisely, in the way how the empirical measure of the finite particle dynamics is used to approximate the mean-field limit. In consideration of the central significance of such approximation with regards to the overall computational complexity of the implemented numerical scheme, we establish a novel probabilistic quantitative result about the convergence of the interacting particle system towards the corresponding mean-field dynamics. By combining both results we provide the first holistic convergence proof of CBO methods on the plane. To demonstrate the versatility and practicability of CBO and its variants we provide numerical experiments for well-understood high-dimensional problems from signal processing and machine learning.

[1] M. Fornasier, T. Klock, and K. Riedl. Consensus-Based Optimization Methods Converge Globally. arXiv:2103.15130, submitted. 2022.

[2] M. Fornasier, T. Klock, and K. Riedl. Convergence of Anisotropic Consensus-Based Optimization in Mean-Field Law. International Conference on the Applications of Evolutionary Computation (Part of EvoStar). Springer, Cham, 2022.

[3] K. Riedl. Leveraging Memory Effects and Gradient Information in Consensus-Based Optimization: On Global Convergence in Mean-Field Law. arXiv:2211.12184, submitted. 2022.

Joint work with Massimo Fornasier (Technical University of Munich, Munich Center for Machine Learning, Germany) and Timo Klock (DeepTech Consulting, Norway).

View abstract PDF


An adaptive superfast inexact proximal augmented Lagrangian method for smooth nonconvex composite optimization problems

Arnesh Sujanani

Georgia Institute of Technology, United States of America   -   This email address is being protected from spambots. You need JavaScript enabled to view it.

This work presents an adaptive superfast proximal augmented Lagrangian (AS-PAL) method for solving linearly-constrained smooth nonconvex composite optimization problems. Each iteration of AS-PAL inexactly solves a possibly nonconvex proximal augmented Lagrangian (AL) subproblem obtained by an aggressive/adaptive choice of prox stepsize with the aim of substantially improving its computational performance followed by a full Lagrangian multiplier update. A major advantage of AS-PAL compared to other AL methods is that it requires no knowledge of parameters (e.g., size of constraint matrix, objective function curvatures, etc) associated with the optimization problem, due to its adaptive nature not only in choosing the prox stepsize but also in using a crucial adaptive accelerated composite gradient variant to solve the proximal AL subproblems. The speed and efficiency of AS-PAL is demonstrated through extensive computational experiments showing that it can solve many instances more than ten times faster than other state-of-the-art penalty and AL methods, particularly when high accuracy is required.

Joint work with Renato D.C. Monteiro (Georgia Institute of Technology, USA).

View abstract PDF