Session II.7 - Computational Harmonic Analysis and Data Science
Talks
Thursday, June 15, 14:00 ~ 15:00
Data-driven regularization for inverse problems - the dos and don‘ts
Carola-Bibiane Schönlieb
University of Cambridge, United Kingdom - This email address is being protected from spambots. You need JavaScript enabled to view it.
Abstract: Inverse problems are about the reconstruction of an unknown physical quantity from indirect measurements. They appear in a variety of places, from medical imaging, for instance MRI or CT, to remote sensing, for instance Radar, to material sciences and molecular biology, for instance electron microscopy. Here, inverse problems is a tool for looking inside specimen, resolving structures beyond the scale visible to the naked eye, and to quantify them. It is a mean for diagnosis, prediction and discovery. Most inverse problems of interest are ill-posed and require appropriate mathematical treatment for recovering meaningful solutions. Classically, such approaches are derived almost conclusively in a knowledge driven manner, constituting handcrafted mathematical models. Examples include variational regularization methods with Tikhonov regularization, the total variation and several sparsity-promoting regularizers such as the L1 norm of Wavelet coefficients of the solution. While such handcrafted approaches deliver mathematically rigorous and computationally robust solutions to inverse problems, they are also limited by our ability to model solution properties accurately and to realise these approaches in a computationally efficient manner. Recently, a new paradigm has been introduced to the regularization of inverse problems, which derives solutions to inverse problems in a data driven way. Here, the inversion approach is not mathematically modelled in the classical sense, but modelled by highly over-parametrised models, typically deep neural networks, that are adapted to the inverse problems at hand by appropriately selected training data. Current approaches that follow this new paradigm distinguish themselves through solution accuracies paired with computational efficieny that were previously unconceivable. In this lecture I will give an introduction to this new data-driven paradigm for inverse problems. Presented methods include data-driven variational models and plug-and-play approaches, learned iterative schemes aka learned unrolling, and learned post-processing. Throughout presenting these methodologies, we will discuss their theoretical properties and provide numerical examples for image denoising, deconvolution and computed tomography reconstruction. The lecture will finish with a discussion of open problems and future perspectives.
Thursday, June 15, 15:00 ~ 15:30
Max filtering
Dustin Mixon
The Ohio State University, USA - This email address is being protected from spambots. You need JavaScript enabled to view it.
Machine learning algorithms are designed for data in Euclidean space. When naively representing data in a Euclidean space $V$, there is often a nontrivial group $G$ of isometries such that different members of a common $G$-orbit represent the same data point. To properly model such data, we want to map the set $V/G$ of orbits into Euclidean space in a way that is bilipschitz in the quotient metric. In this talk, we take inspiration from an inverse problem called phase retrieval to find a large and flexible class of bilipschitz invariants that we call max filter banks. We discuss how max filter banks perform in theory and in practice, and we conclude with several open problems.
Joint work with Jameson Cahill (University of North Carolina Wilmington), Joseph W. Iverson (Iowa State University) and Daniel Packer (The Ohio State University).
Thursday, June 15, 15:30 ~ 16:00
Are neural operators really neural operators?
Rima Alaifari
ETH Zurich, Switzerland - This email address is being protected from spambots. You need JavaScript enabled to view it.
In operator learning, it has been observed that proposed models may not behave as operators when implemented on a computer, questioning the very essence of what operator learning should be. We contend that some form of continuous-discrete equivalence is necessary for an architecture to genuinely learn the underlying operator, rather than just discretizations of it. Employing frames, we introduce the framework of Representation equivalent Neural Operator (ReNO) to ensure operations at the continuous and discrete level are equivalent.
Joint work with Francesca Bartolucci (TU Delft), Emmanuel de Bezenac (ETH Zurich), Bogdan Raonic (ETH Zurich), Roberto Molinaro (ETH Zurich), Siddhartha Mishra (ETH Zurich).
Thursday, June 15, 16:30 ~ 17:00
Robust low-rank matrix completion with adversarial noise
Felix Krahmer
Technical University of Munich & Munich Center for Machine Learning, Germany - This email address is being protected from spambots. You need JavaScript enabled to view it.
The problem of recovering a high-dimensional low-rank matrix from a limited set of random measurements has enjoyed various applications and gained a detailed theoretical foundation over the last 15 years. An instance of particular interest is the matrix completion problem where the measurements are entry observations. The first rirgorous recovery guarantees for this problem were derived for the nuclear norm minimization approach, a convex proxy for the NP-hard problem of constrained rank minimization. For matrices whose entries are ”spread out” well enough, this convex problem admits a unique solution which corresponds to the ground truth. In the presence of random measurement noise, the reconstruction performance is also well-studied, but the performance for adversarial noise remains less understood. While some error bounds have been derived for both convex and nonconvex approaches, these bounds exhibit a gap to information-theoretic lower bounds and provable performance for Gaussian measurements. However, a recent analysis of the problem suggests that under small-scale adversarsial noise, the reconstruction error can be significantly amplified. In this talk, we investigate this amplification quantitatively and provide new reconstruction bounds for both small and large noise levels that suggest a quadratic dependence between the reconstruction error and the noise level.
Joint work with Julia Kostin (ETH Zurich, Switzerland) and Dominik Stöger (KU Eichstätt-Ingolstadt, Germany)..
Thursday, June 15, 17:00 ~ 17:30
Mathematics of private synthetic data
Roman Vershynin
University of California, Irvine, U.S.A. - This email address is being protected from spambots. You need JavaScript enabled to view it.
Protection of private information is of vital importance in data-driven research, business, and government. We propose a new mathematical framework for generation of differentially private synthetic data. It is based on new insights, and inspires new problems, in discrete Fourier analysis, random processes, and numerical linear algebra.
Joint work with March Boedihardjo (ETH Zurich), Yiyun He (UC Irvine), Thomas Strohmer (UC Davis) and Yizhe Zhu (UC Irvine).
Thursday, June 15, 17:30 ~ 18:00
What Makes Data Suitable for Deep Learning?
Nadav Cohen
Tel Aviv University, Israel - This email address is being protected from spambots. You need JavaScript enabled to view it.
Deep learning is delivering unprecedented performance when applied to various data modalities, yet there are data distributions over which it utterly fails. The question of what makes a data distribution suitable for deep learning is a fundamental open problem in the field. In this talk I will present a recent theory aiming to address the problem via tools from quantum physics. The theory establishes that certain neural networks are capable of accurate prediction over a data distribution if and only if the data distribution admits low quantum entanglement under certain partitions of features. This brings forth practical methods for adaptation of data to neural networks, and vice versa. Experiments with widespread models over various datasets will demonstrate the findings. An underlying theme of the talk will be the potential of physics to advance our understanding of the relation between deep learning and real-world data.
Joint work with my graduate students Noam Razin, Yotam Alexander, Nimrod De La Vega and Tom Verbin.
Thursday, June 15, 18:00 ~ 18:30
A simple approach for quantizing neural networks
Johannes Maly
LMU Munich, Germany - This email address is being protected from spambots. You need JavaScript enabled to view it.
In this talk, I propose a new method for quantizing the weights of a fully trained neural network. A simple deterministic pre-processing step allows to quantize network layers via memoryless scalar quantization while preserving the network performance on given training data. On one hand, the computational complexity of this pre-processing slightly exceeds that of state-of-the-art algorithms in the literature. On the other hand, the new approach does not require any hyper-parameter tuning and, in contrast to previous methods, allows a plain analysis. I provide rigorous theoretical guarantees in the case of quantizing single network layers and show that the relative error decays with the number of parameters in the network if the training data behaves well, e.g., if it is sampled from suitable random distributions. The developed method also readily allows the quantization of deep networks by consecutive application to single layers.
Joint work with Rayan Saab (UCSD).
Friday, June 16, 14:00 ~ 15:00
Completion of matrices with low description complexity
Helmut Bölcskei
ETH Zurich, Switzerland - This email address is being protected from spambots. You need JavaScript enabled to view it.
We propose a theory for matrix completion that goes beyond the low-rank structure commonly considered in the literature and applies to general matrices of low description complexity, including sparse matrices, matrices with sparse factorizations such as, e.g., sparse R-factors in their QR-decomposition, and algebraic combinations of matrices of low description complexity. The mathematical concept underlying this theory is that of rectifiability, a basic notion in geometric measure theory. Complexity of the sets of matrices encompassed by the theory is measured in terms of Hausdorff and Minkowski dimensions. Our goal is the characterization of the number of linear measurements, with an emphasis on rank-$1$ measurements, needed for the existence of an algorithm that yields reconstruction, either perfect, with probability $1$, or with arbitrarily small probability of error, depending on the setup. Specifically, we show that matrices taken from a set $U$ such that $U − U$ has Hausdorff dimension $s$ can be recovered from $k \gt s$ measurements, and random matrices supported on a set $U$ of Hausdorff dimension $s$ can be recovered with probability $1$ from $k \gt s$ measurements. What is more, we establish the existence of $\beta$-Hölder continuous decoders recovering matrices taken from a set of upper Minkowski dimension $s$ from $k \gt 2s/(1 − \beta)$ measurements and, with arbitrarily small probability of error, random matrices supported on a set of upper Minkowski dimension $s$ from $k \gt s/(1 − \beta)$ measurements.
Joint work with Erwin Riegler, ETH Zurich, Switzerland, Günther Koliander, Austrian Academy of Sciences, Austria and David Stotz, Kantonsschule Schaffhausen, Switzerland.
Friday, June 16, 15:00 ~ 15:30
Convergence of Entropy-Regularized Neural Natural Actor-Critic
Semih Cayci
RWTH Aachen University, Germany - This email address is being protected from spambots. You need JavaScript enabled to view it.
Natural actor-critic (NAC) and its variants, equipped with the representation power of neural networks, have demonstrated impressive empirical success in solving Markov decision problems with large state spaces. In this paper, we present a finite-time analysis of NAC with neural network approximation, and identify the roles of neural networks, regularization and optimization techniques (e.g., gradient clipping and averaging) to achieve provably good performance in terms of sample complexity, iteration complexity and overparametrization bounds for the actor and the critic. In particular, we prove that (i) entropy regularization and averaging ensure stability by providing sufficient exploration to avoid near-deterministic and strictly suboptimal policies and (ii) regularization leads to sharp sample complexity and network width bounds in the regularized MDPs, yielding a favorable bias-variance tradeoff in policy optimization. In the process, we identify the importance of uniform approximation power of the actor neural network to achieve global optimality in policy optimization due to distributional shift.
Joint work with Niao He (ETH Zurich) and R. Srikant (University of Illinois at Urbana-Champaign).
Friday, June 16, 15:30 ~ 16:00
Convergence and error analysis of PINNs
Claire Boyer
Sorbonne Université - This email address is being protected from spambots. You need JavaScript enabled to view it.
Physics-informed neural networks (PINNs) are a promising approach that combines the power of neural networks with the interpretability of physical modeling. PINNs have shown good practical performance in solving partial differential equations (PDEs) and in hybrid modeling scenarios, where physical models enhance data-driven approaches. However, it is essential to establish their theoretical properties in order to fully understand their capabilities and limitations. In this study, we highlight that classical training of PINNs can suffer from systematic overfitting. This problem can be addressed by adding a ridge regularization to the empirical risk, which ensures that the resulting estimator is risk-consistent for both linear and nonlinear PDE systems. However, the strong convergence of PINNs to a solution satisfying the physical constraints requires a more involved analysis using tools from functional analysis and calculus of variations. In particular, for linear PDE systems, an implementable Sobolev-type regularization allows to reconstruct a solution that not only achieves statistical accuracy but also maintains consistency with the underlying physics.
Joint work with Nathan Doumèche (Sorbonne Université and EDF) and Gérard Biau (Sorbonne Université).
Friday, June 16, 16:30 ~ 17:00
Private synthetic data
March Boedihardjo
ETH Zurich, Switzerland - This email address is being protected from spambots. You need JavaScript enabled to view it.
We study the extent to which private synthetic data can accurately and uniformly preserve a given class of queries. For the class of 1-Lipschitz queries on hypercubes, we find the optimal accuracy up to a universal constant. We also study this question for other classes of queries.
Joint work with Thomas Strohmer (UC Davis), Roman Vershynin (UC Irvine), Konstantin Donhauser (ETH Zurich), Johan Lokna (ETH Zurich), Amartya Sanyal (ETH Zurich), Robert Honig (ETH Zurich) and Fanny Yang (ETH Zurich).
Friday, June 16, 17:00 ~ 17:30
Convergence of MOD and ODL for dictionary learning
Karin Schnass
Universität Innsbruck, Austria - This email address is being protected from spambots. You need JavaScript enabled to view it.
In this talk we will present sufficient conditions for the convergence of two of the most popular dictionary learning algorithms - Method of Optimal Directions (MOD) and Approximate K Singular Value Decomposition (aKSVD). Assuming that the signals following an S-sparse model based on a well-behaved generating dictionary, where each dictionary element may be used with different probability, we show the following: Given enough training signals and starting with a well-behaved initialisation, that is either within distance at most $1/\log(K)$ to the generating dictionary or has a special structure ensuring that each element of the initialisation dictionary corresponds to exactly one element of the generating dictionary, both algorithms converge with geometric convergence rate to the generating dictionary.
Joint work with Simon Ruetz (Universität Innsbruck).
Friday, June 16, 17:30 ~ 18:30
Randomly pivoted Cholesky
Joel Tropp
Caltech, USA - This email address is being protected from spambots. You need JavaScript enabled to view it.
Kernel methods are used for prediction and clustering in many data science and scientific computing applications, but applying kernel methods to a large number of data points $N$ is expensive due to the high cost of manipulating the $N \times N$ kernel matrix. A basic approach for speeding up kernel computations is low-rank approximation, in which we replace the kernel matrix $\mathbf{A}$ with a factorized approximation that can be stored and manipulated more cheaply. When the kernel matrix $\mathbf{A}$ has rapidly decaying eigenvalues, mathematical existence proofs guarantee that $\mathbf{A}$ can be accurately approximated using a constant number of columns (without ever looking at the full matrix). Nevertheless, for a long time, designing a practical and provably justified algorithm to select the appropriate columns proved challenging.
This talk introduces Randomly Pivoted Cholesky (RPC), a natural algorithm for approximating an $N \times N$ positive-semidefinite matrix using $k$ adaptively sampled columns. RPC can be implemented with just a few lines of code; it requires only $(k+1)N$ entry evaluations and $\mathcal{O}(k^2 N)$ additional arithmetic operations. In experiments, RPC matches or improves on the performance of alternative algorithms for low-rank psd approximation. Moreover, RPC provably achieves near-optimal approximation guarantees. The simplicity, effectiveness, and robustness of this algorithm strongly support its use for large-scale kernel computations. This work offers an accessible example of the power of using randomized algorithms for linear algebra computations.
Available at arXiv:2207.06503.
Joint work with Yifan Chen (Caltech), Ethan Epperly (Caltech) and Rob Webber (Caltech).
Saturday, June 17, 14:00 ~ 15:00
Physics-inspired learning on graphs
Michael Bronstein
Oxford, UK - This email address is being protected from spambots. You need JavaScript enabled to view it.
The message-passing paradigm has been the “battle horse” of deep learning on graphs for several years, making graph neural networks a big success in a wide range of applications, from particle physics to protein design. From a theoretical viewpoint, it established the link to the Weisfeiler-Lehman hierarchy, allowing to analyse the expressive power of GNNs. We argue that the very “node-and-edge”-centric mindset of current graph deep learning schemes may hinder future progress in the field. As an alternative, we propose physics-inspired “continuous” learning models that open up a new trove of tools from the fields of differential geometry, algebraic topology, and differential equations so far largely unexplored in graph ML.
Saturday, June 17, 15:00 ~ 15:30
On the Training of Infinitely Deep and Wide ResNets
Gabriel Peyré
CNRS and ENS, France - This email address is being protected from spambots. You need JavaScript enabled to view it.
Overparametrization is a key factor in the absence of convexity to explain global convergence of gradient descent (GD) for neural networks. Beside the well studied lazy regime, infinite width (mean field) analysis has been developed for shallow networks, using convex optimization technics. To bridge the gap between the lazy and mean field regimes, we study Residual Networks (ResNets) in which the residual block has linear parametrization while still being nonlinear. Such ResNets admit both infinite depth and width limits, encoding residual blocks in a Reproducing Kernel Hilbert Space (RKHS). In this limit, we prove a local Polyak-Lojasiewicz inequality. Thus, every critical point is a global minimizer and a local convergence result of GD holds, retrieving the lazy regime. In contrast with other mean-field studies, it applies to both parametric and non-parametric cases under an expressivity condition on the residuals. Our analysis leads to a practical and quantified recipe: starting from a universal RKHS, Random Fourier Features are applied to obtain a finite dimensional parameterization satisfying with high-probability our expressivity condition.
Joint work with Raphaël Barboni (ENS, France) and François-Xavier Vialard (Paris-Est, France)..
Saturday, June 17, 15:30 ~ 16:00
Hierarchical systems of exponential bases
Götz Pfander
Catholic University Eichstätt-Ingolstadt, Germany - This email address is being protected from spambots. You need JavaScript enabled to view it.
Fourier series form a cornerstone of analysis; it allows the expansion of a complex valued 1-periodic function in the basis of integer frequency exponentials. A simple rescaling argument shows that by splitting the integers into evens and odds, we obtain orthogonal bases for functions defined on the first, respectively the second half of the unit interval. We develop generalizations of this curiosity and show that, for example, for any finite partition of the unit interval into subintervals exists a partition of integers into subsets, each of which forms a basis for functions supported on the respective subinterval.
Joint work with David Walnut (George Mason University, USA).
Saturday, June 17, 16:30 ~ 17:00
Three Vignettes in Computational Optimal Recovery
Simon Foucart
Texas A&M University, United States - This email address is being protected from spambots. You need JavaScript enabled to view it.
The question addressed in this talk pertains to the utilization of observational data in order to optimally recover functions (or other objects) in a worst-case setting relative a model set based on approximation capabilities. The emphasis is put on the computational realization of the optimal recovery maps. In a first vignette, dealing with the space of continuous functions, I will showcase an algorithm to produce an optimal map---a linear one, to boot---for full recovery when the underlying approximation space is a Chebyshev space. In a second vignette, set in Hilbert spaces, I will indicate how to treat deterministically inaccurate data, especially given an $\ell_1$-bound, and again reveal the optimality of linear recovery maps. In a third vignette, focusing on the estimation of linear functionals but in arbitrary norm spaces, I will show that linear recovery maps are near optimal in the presence of stochastically inaccurate data when the noise distribution is log-concave.
Saturday, June 17, 17:00 ~ 18:00
The separation capacity of random neural networks
Sjoerd Dirksen
Utrecht University, The Netherlands - This email address is being protected from spambots. You need JavaScript enabled to view it.
Neural networks with random weights appear in a variety of machine learning applications, most prominently as the initialization of many deep learning algorithms and as a computationally cheap alternative to fully learned neural networks. The first goal of this talk is to enhance the theoretical understanding of random neural networks by addressing the following data separation problem: under which conditions can a random neural network make two classes (with positive distance) linearly separable? I will show that a sufficiently large two-layer ReLU-network with Gaussian weights and uniformly distributed biases can solve this problem with high probability. Building on the insights behind this result, I will next present a simple randomized algorithm to produce a small interpolating neural net for a given dataset with two classes. In both results, the size of the network is explicitly linked to geometric properties of the two classes and their mutual arrangement. This instance-specific viewpoint allows to overcome the curse of dimensionality. I will connect the presented results with related work on approximation, memorization, and generalization.
Joint work with Patrick Finke (Utrecht University, The Netherlands), Martin Genzel (Utrecht University, The Netherlands and Merantix Momentum), Laurent Jacques (UC Louvain, Belgium) and Alexander Stollenwerk (UC Louvain, Belgium and KPMG Germany).
Posters
Ellipsoid Methods for Metric Entropy Rates Computations
Thomas Allard
ETH Zürich, Switzerland - This email address is being protected from spambots. You need JavaScript enabled to view it.
Historically, metric entropy has played a significant role in various domains such as non-linear approximation [1–3], statistical learning theory [4–6], and empirical processes theory [7, 8]. Recent advances in machine learning theory, and more specifically in deep learning, have led to renewed interest in the concept of metric entropy. Indeed, metric entropy is at the heart of the study of the approximation-theoretic properties and the statistical learning capabilities of deep neural networks [6, 9]. However, computing the precise value of the metric entropy of a given function class turns out to be notoriously difficult in general; exact expressions are available only in very few simple cases. For this reason, it has become common practice to resort to characterizations of the asymptotic behavior of metric entropy only. Even this more modest endeavor has turned out daunting in most cases. Consequently, a sizeable body of research exists in the literature; the survey [10] illustrates the multiplicity of methods employed. Through this survey, one is led to the insight that many of the standard methods implicitly rely on the computation of metric entropy for infinite-dimensional ellipsoids, highlighting the importance of developing a general method for this specific case. A first step toward such a general method was made in [11, 12]. Building on their result, we present a new method for the derivation of the asymptotic behavior of the metric entropy for infinite-dimensional ellipsoids. We further argue that our results provide a unified framework for the derivation of the metric entropy rates of a wide variety of function classes, such as Besov spaces, Modulation spaces, Sobolev spaces, and various classes of analytic functions, thereby retrieving and improving standard results.
[1] Lorentz, G. G. (1966). Metric entropy and approximation.
[2] Lorentz, G. G. (1966). Approximation of Functions, Athena Series. Selected Topics in Mathematics.
[3] Carl, B., & Stephani, I. (1990). Entropy, Compactness and the Approximation of Operators (Cambridge Tracts in Mathematics). Cambridge: Cambridge University Press.
[4] Haussler, D. (1992). Decision theoretic generalizations of the PAC model for neural net and other learning applications. Information and Computation, 100(1), 78-150.
[5] Wainwright, M. J. (2019). High-dimensional statistics: A non-asymptotic viewpoint (Vol. 48). Cambridge university press.
[6] Shalev-Shwartz, S., & Ben-David, S. (2014). Understanding Machine Learning: From Theory to Algorithms. Cambridge: Cambridge University Press.
[7] Dudley, R.M. (1984). A course on empirical processes. In: Hennequin, P.L. (eds) École d'Été de Probabilités de Saint-Flour XII - 1982. Lecture Notes in Mathematics, vol 1097. Springer, Berlin, Heidelberg.
[8] Pollard, D. (1990). Empirical processes: theory and applications. NSF-CBMS Regional Conf. Ser. Probab. Statist., 2: 86pp
[9] Elbrächter, D., Perekrestenko, D., Grohs, P., & Bölcskei, H. (2021). Deep neural network approximation theory. IEEE Transactions on Information Theory, 67(5), 2581-2623.
[10] Tikhomirov, V.M. (1993). ε-Entropy and ε-Capacity of Sets In Functional Spaces. In: Shiryayev, A.N. (eds) Selected Works of A. N. Kolmogorov. Mathematics and Its Applications, vol 27. Springer, Dordrecht.
[11] Luschgy, H., & Pages, G. (2004). Sharp asymptotics of the Kolmogorov entropy for Gaussian measures. Journal of Functional Analysis, 212(1), 89-120.
[12] Graf, S., & Luschgy, H. (2004). Sharp asymptotics of the metric entropy for ellipsoids. Journal of Complexity, 20(6), 876-882.
Joint work with Prof. Dr. Helmut Bölcskei (ETH Zürich).
Sampling in spaces of variable bandwidth generated via Wilson basis
Beatrice Andreolli
University of Vienna, Austria - This email address is being protected from spambots. You need JavaScript enabled to view it.
We introduce a new concept of variable bandwidth that is based on the truncation of Wilson expansions. For this model we derive both (nonuniform) sampling theorems, the complete reconstruction of a function from its samples, and necessary density conditions for sampling.
Joint work with Karlheinz Gröchenig (University of Vienna).
Revisiting RIP guarantees for sketching operators on mixture models
Ayoub Belhadji
ENS de Lyon, France - This email address is being protected from spambots. You need JavaScript enabled to view it.
In the context of sketching for compressive mixture modeling, we revisit existing proofs of the Restricted Isometry Property of sketching operators with respect to certain mixtures models. After examining the shortcomings of existing guarantees, we propose an alternative analysis that circumvents the need to assume importance sampling when drawing random Fourier features to build random sketching operators. Our analysis is based on new deterministic bounds on the restricted isometry constant that depend solely on the set of frequencies used to define the sketching operator; then we leverage these bounds to establish concentration inequalities for random sketching operators that lead to the desired RIP guarantees. Our analysis also opens the door to theoretical guarantees for structured sketching with frequencies associated to fast random linear operators.
Joint work with Rémi Gribonval (ENS de Lyon, France).
Theoretical guarantees for generative compressed sensing with subsampled isometries
Aaron Berk
McGill University, Canada - This email address is being protected from spambots. You need JavaScript enabled to view it.
In work by Bora et al. (2017), a mathematical framework was developed for compressed sensing guarantees when the measurement matrix is Gaussian and the signal structure is the range of a Lipschitz function (with applications to generative neural networks (GNNs)). We consider measurement matrices derived by sampling uniformly at random rows of a unitary matrix (including subsampled Fourier measurements as a special case). We prove the first known restricted isometry guarantee for compressed sensing with GNNs and subsampled isometries, and provide recovery bounds. Recovery efficacy is characterized by the coherence, a new parameter, which measures the interplay between the range of the network and the measurement matrix. Furthermore, we propose a regularization strategy for training GNNs to have favourable coherence with the measurement operator. We provide compelling numerical simulations that support this regularized training strategy: our strategy yields low coherence networks that require fewer measurements for signal recovery. This, together with our theoretical results, supports coherence as a natural quantity for characterizing generative compressed sensing with subsampled isometries. This poster is based on a recent co-authored publication in IEEE JSAIT.
Joint work with Simone Brugiapaglia (Concordia University, Montreal, Canada), Babhru Joshi (University of British Columbia, Vancouver, Canada), Matthew Scott (University of British Columbia, Vancouver, Canada), Yaniv Plan (University of British Columbia, Vancouver, Canada) and Özgür Yilmaz (University of British Columbia, Vancouver, Canada).
Analysing Implicit Bias/Regularisation via Invariant, Bregman Divergence, and Normalisation
Hung-Hsu Chou
Ludwig-Maximilians-Universität München, Germany - This email address is being protected from spambots. You need JavaScript enabled to view it.
Out of infinitely many possible explanations to the observed data, machines are often capable of approaching the ones with "good" properties, such as generalisability or compressibility, even when those properties are not explicitly specified in the training algorithms. Such phenomenon, known as the implicit bias/regularisation, has been intensively studied in the last decade and stimulated significant development of fundamental understanding in machine learning.
My presentation focuses on analysing the implicit bias from different tools and perspectives: invariants, Bregman divergence, and normalisation. I use invariants, quantities that do not change in time, to identify the region where the training process lies. I further analyse the process via Bregman divergence, a generalisation of distance, that is embedded in the algorithm and use it to characterise the end result of the training. Moreover, I discover that additional normalisation in the algorithm strengthen its robustness with respect to the change in initialization, and consequently increase the numerical efficiency.
I am currently focusing on extending tools and concepts to a wider range of models, such as linear networks and multilayer perceptrons. I am also interested in neural tangent kernel, neural collapse, spectral bias, and related fields.
Joint work with Johannes Maly (Ludwig-Maximilians-Universität München, Germany), Holger Rauhut (Rheinisch-Westfälische Technische Hochschule Aachen, Germany), Dominik Stöger (Katholische Universität Eichstätt-Ingolstadt, Germany), Cristian Vega (University of Genoa, Italy) and Rachel Ward (University of Texas at Austin, United States of America).
Line Search Methods for Deep Learning. It Could Work
Leonardo Galli
RWTH Aachen University, Germany - This email address is being protected from spambots. You need JavaScript enabled to view it.
Stochastic Gradient Descent (SGD) is the horsepower for the whole deep learning activity today. Even though its simplicity and low memory requirements seem crucial for dealing with these huge models, the success of SGD is deeply connected to the choice of the learning rate. In this paper we show that line search methods are a valid alternative to SGD for training convolutional deep learning models and transformers. Following the classical optimization doctrine, we here combine a fast initial step size (Polyak) with a nonmonotone line search. We show that to achieve the best performances, the initial step size sometimes needs a line search to control its growth, especially while still in the global phase. This behavior agrees with common optimization knowledge and with the theoretical expectations regarding line search methods. Moreover, to deal with the increased amount of backtracking steps, we here develop a new technique that is able to reduce them to 0 (in average), while not altering the behavior of the original step size. To conclude, we prove the first rates of convergence for nonmonotone line search methods in the stochastic setting under interpolation.
Joint work with Holger Rauhut (RWTH Aachen University) and Mark Schmidt (University of British Columbia).
Advances in phaseless sampling of the short-time Fourier transform
Lukas Liehr
University of Vienna, Austria - This email address is being protected from spambots. You need JavaScript enabled to view it.
Short-time Fourier transform (STFT) phase retrieval refers to the reconstruction of a function $f \in L^2(\mathbb{R})$ from phaseless samples of the form $$ |V_gf(\Lambda)| := \{ |V_gf(\lambda)| : \lambda \in \Lambda \}, $$ where $V_gf$ denotes the STFT of $f$ with respect to a window function $g \in L^2(\mathbb{R})$, and $\Lambda \subseteq \mathbb{R}^2$ is a set of sampling locations. We present recent advances in STFT phase retrieval and focus on the question for which window functions $g$ and which sets $\Lambda$ is every $f$ uniquely determined (up to a global phase) by the samples $|V_gf(\Lambda)|$. It turns out, that the phaseless sampling problem differs from ordinary sampling in a rather fundamental way: if $\Lambda = A\mathbb{Z}^2$ is a lattice then uniqueness is unachievable, independent of the choice of the window function and the density of the lattice. On the basis of this discretization barrier, we present possible ways to overcome it. We show that a restriction of the function class from $L^2(\mathbb{R})$ to certain subspaces of $L^2(\mathbb{R})$ yields uniqueness of the problem if one samples on sufficiently dense lattices. Finally, we highlight that without any restriction of the function class, a discretization is still possible: either one changes the sampling scheme from sampling on ordinary lattices to so-called square-root lattices, or one considers a multi-window system with 4 window functions . These results constitute the first general uniqueness theorems for the STFT phase retrieval problem.
Joint work with Philipp Grohs (University of Vienna), Martin Rathmair (Université Bordeaux) and Irina Shafkulovska (University of Vienna).
Accelerated Griffin-Lim algorithm: A fast and provably convergent numerical method for phase retrieval
Rossen Nenov
Austrian Academy of Sciences, Austria - This email address is being protected from spambots. You need JavaScript enabled to view it.
The recovery of a signal from the magnitudes of its transformation, like the Short Time Fourier transform, is known as the phase retrieval problem and it is of considerable relevance in various fields of engineering and applied physics. The Griffin-Lim algorithm (GLA) is a staple method commonly used for the phase retrieval problem, which is based on alternating projections. In 2013, Perraudin, Balazs and Søndergaard [1] introduced a momentum step based on Nesterov's Accelerated Gradient Method to enhance the convergence speed of GLA in numerical experiments, creating a new fast algorithm called the Fast Griffin-Lim algorithm (FGLA), which has since been widely used in the field. In our work, we design a further momentum and over-relaxation based modification of the Griffin-Lim Algorithm, which further enhances the performance, and we prove a convergence guarantee for the new algorithm and for FGLA, which up to this point remained an open question.
[1] Perraudin, Nathanaël & Balazs, Peter & Søndergaard, Peter. (2013). A Fast Griffin–Lim Algorithm. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics
Joint work with Dang-Khoa Nguyen (University of Vienna), Radu Ioan Bot (University of Vienna) and Peter Balazs (Austrian Academy of Sciences).
Gradient Descent and Stochastic Gradient Descent convergence for Learning Linear Neural Networks
Gabin Maxime Nguegnang
RWTH Aachen University, Germany - This email address is being protected from spambots. You need JavaScript enabled to view it.
We study the convergence properties of gradient descent and stochastic gradient descent for learning deep linear neural networks. First of all, we extend a previous analysis for the related gradient flow. We show that under suitable conditions on the step sizes gradient descent converges to a critical point of the square loss function. Moreover, we prove that for almost all initialization gradient descent converges to a global minimum in the case of two layers. In the case of three or more layers we show that gradient descent converges to a global minimum on the manifold matrices of some fixed rank, where the rank cannot be determined a priori. Furthermore, We use an analytical approach that combines stochastic gradient descent iterates and gradient flow trajectories base on stochastic approximation theory to analyze stochastic gradient descent dynamic. Then establish the almost sure boundedness of stochastic gradient descent iterates and its convergence guarantee for learning deep linear neural networks. Most studies on the analysis of stochastic gradient descent for nonconvex problem have entirely focused on convergence property which only indicate that the second moment of the loss function gradient tend to zero. Our work demonstrates the convergence of stochastic gradient descent to a critical point of the square loss almost surely for learning deep linear neural networks.
Joint work with Bubacarr Bah (Medical Research Council Unit The Gambia at the London School of Hygiene & Tropical Medicine, Gambia), Holger Rauhut (RWTH Aachen University, Germany) and Ulrich Terstiege (RWTH Aachen University, Germany).
Essential duals using alternate frame operators
Mitra Shamsabadi
Acoustics Research Institute, Austrian Academy of Sciences, Austria - This email address is being protected from spambots. You need JavaScript enabled to view it.
"Fusion frames as an important generalization of discrete frames are completely different in duality. One reason for the technicalities of fusion frame duals is the mismatch of the coefficient spaces of different systems. In order to achieve this, a bounded operator M, a mapping between the Hilbert spaces of the subspaces, is needed. Hence, it seems that the fusion frame operator must be modified by this map to be compatible with the duality in fusion frames (specially, Gavruta duals). For fusion frames W and V, we define the alternate cross-frame operator by $L_{W,V}^M= T_WMT_V^*$, where $M: \sum \oplus V_i \to \sum \oplus W_i$. If $ W=V$ and $M=\{\pi_{W_i}S_W^{-1}\}$, we call it the alternate frame operator. We show that the alternate frame operator of a fusion Riesz basis is actually the frame operator of its 1-uniform version. We will establish a class of pioneer duals, which is called essential duals, with the alternate frame operator and obtain novel reconstruction formulas by such duals. Finally we provide some examples which compare the reconstruction formula applying the frame operator(in the Gavruta sense) and the alternate frame operator (in the essential sense)."
Joint work with Peter Balazs (Acoustics Research Institute, Austrian Academy of Sciences, Austria) and Ali Akbar Arefijamaal (Hakim Sabzevari University, Iran).
Upper and lower bounds for strong basins of attractions of non-convex sparse spike estimation
Yann Traonmilin
CNRS, Institut de mathématiques de Bordeaux, France - This email address is being protected from spambots. You need JavaScript enabled to view it.
Finding its utility in various domains of signal processing and data science, off-the-grid sparse spike estimation aims at recovering positions and amplitudes of Dirac measures observed through an under-determined linear operator. Under a restricted isometry assumption on the observation operator, we study the size of strong basins of attractions for the non-convex off-the-grid sparse spike estimation problem. We first extend previous results [1] to obtain a lower bound on the size of sets where gradient descent converges with a linear rate to the minimum of the non-convex objective functional. We then give an upper bound that shows that the dependency of the lower bound with respect to the number of measurements reflects well the true size of basins of attraction for random Gaussian Fourier measurements. These theoretical results are confirmed by experiments.
This poster is based on the the preprint: On strong basins of attractions for non-convex sparse spike estimation: upper and lower bounds, Y. Traonmilin, J.-F. Aujol, A. Leclaire and P.-J. Bénard available at https://hal.science/hal-04047677
[1] Y. Traonmilin, J.-F. Aujol, and A. Leclaire. The basins of attraction of the global minimizers of non-convex inverse problems with low-dimensional models in infinite dimension. Information and Inference: A Journal of the IMA, 04 2022. iaac011.
Joint work with Jean-François Aujol ( Institut de mathématiques de Bordeaux, France), Arthur Leclaire (Institut de mathématiques de Bordeaux, France) and Pierre-Jean Bénard ( Institut de mathématiques de Bordeaux, France).