# Talks

## Optimization & Numerical Analysis

## Optimization & Numerical Analysis

Image denoising is the most fundamental problem in image enhancement, and it is largely solved: It has reached impressive heights in performance and quality — almost as good as it can ever get. But interestingly, it turns out that we can solve many other problems using the image denoising “engine”. I will describe the Regularization by Denoising (RED) framework: using the denoising engine in defining the regularization of any inverse problem. The idea is to define an explicit image-adaptive regularization functional directly using a high performance denoiser. Surprisingly, the resulting regularizer is guaranteed to be convex, and the overall objective functional is explicit, clear and well-defined. With complete flexibility to choose the iterative optimization procedure for minimizing this functional, RED is capable of incorporating any image denoising algorithm as a regularizer, treat general inverse problems very effectively, and is guaranteed to converge to the globally optimal result.

In this talk we present a generic recursive algorithm for improving image denoising methods. Given the initial denoised image, we suggest repeating the following procedure: (i) Strengthen the signal by adding the previous denoised image to the degraded input image, (ii) Operate the denoising method on the strengthened image, and (iii) Subtract the previous denoised image from the restored signals strengthened outcome. The convergence of this process is studied for the K-SVD image denoising and related algorithms. Furthermore, still in the context of K-SVD image denoising, we introduce an interesting interpretation of the SOS algorithm as a technique for closing the gap between the local patch-modeling and the global restoration task, thereby leading to improved performance. We demonstrate the SOS boosting algorithm for several leading denoising methods (KSVD, NLM, BM3D, and EPLL), showing tendency to further improve denoising performance.

What if we take all the overlapping patches from a given image and organize them to create the shortest path by using their mutual distances? This suggests a reordering of the image pixels in a way that creates a maximal 1D regularity Could we repeat this process in several “scales” ? What could we do with such a construction? In this talk we consider a wider perspective of the above line of questions: We introduce a wavelet transform that is meant for data organized as a connected-graph or as a cloud of high dimensional points. The proposed transform constructs a tree that applies a 1D wavelet decomposition filters, coupled with a pre-reordering of the input, so as to best sparsify the given data. We adopt this transform to image processing tasks by considering the image as a graph, where every patch is a node, and vertices are obtained by Euclidean distances between corresponding patches. State of- the-art image denoising results are obtained with the proposed scheme.

Modeling of signals or images by a sparse and redundant representation is shown in recent years to be very effective, often leading to stat-of-the-art results in many applications. Applications leaning on this model can be cast as energy minimization problems, where the unknown is a high-dimensional and very sparse vector. Surprisingly, traditional tools in optimization, including very recently developed interior-point algorithms, tend to perform very poorly on these problems. A recently emerging alternative is a family of techniques, known as “iterated-shrinkage” methods. There are various and different such algorithms, but common to them all is the fact that each of their iterations require a simple forward and inverse transform (e.g. wavelet), and a scalar shrinkage look-up-table (LUT) step. In this talk we shall explain the need for such algorithms, present some of them, and show how they perform on a classic image deblurring problem.

Compressed-Sensing (CS) offers a joint compression- and sensing-processes, based on the existence of a sparse representation of the treated signal and a set of projected measurements. Work on CS thus far typically assumes that the projections are drawn at random. In this talk we describe ways to optimize these projections. Two methods are considered: (i) A direct optimization with respect to the CS performance, targeting CS applied with the Orthogonal Matching Pursuit (OMP) or Basis Pursuit (BP), and (ii) A simpler method that targets an average measure of the mutual-coherence of the effective dictionary. As the first approach leads to a complex bi-level optimization task that is hard to handle, we demonstrate the second one and show that it leads to better CS reconstruction performance.

Recently, three independent works suggested iterated shrinkage algorithms that generalizes the classic work by Donoho and Johnston. Daubechies, Defrise and De-Mol developed such an algorithm for deblurring, using the concept of surrogate functions. Figueirido and Nowak used the EM algorithm to construct such algorithm, and later extended their work by using the bound-optimization technique. Elad developed such an algorithm based on a parallel coordinate descent (PCD) point of view. In this talk we describe these methods with an emphasis on the later, and demonstrate how it can be of importance for the minimization of a general basis pursuit penalty function. As such, the proposed algorithms form a new pursuit technique, falling in between the basis pursuit and the matching pursuit.

Shrinkage is a well known and appealing denoising technique. The use of shrinkage is known to be optimal for Gaussian white noise, provided that the sparsity on the signal’s representation is enforced using a unitary transform. Still, shrinkage is also practiced successfully with non-unitary, and even redundant representations. In this lecture we shed some light on this behavior. We show that simple shrinkage could be interpreted as the first iteration of an algorithm that solves the basis pursuit denoising (BPDN) problem. Thus, this work leads to a sequential shrinkage algorithm that can be considered as a novel and effective pursuit method. We demonstrate this algorithm, both synthetically, and for the image denoising problem, showing in both cases its superiority over several popular alternatives..

Retinex theory deals with the removal of unfavorable illumination effects from images. This ill-posed inverse problem is typically regularized by forcing spatial smoothness on the recoverable illumination. Recent work in this field suggested exploiting the knowledge that the illumination image bounds the image from above, and the fact that the reflectance is also expected to be smooth.

In this lecture we show how the above model can be improved to provide a non-iterative retinex algorithm that handles better edges in the illumination, and suppresses noise in dark areas. This algorithm uses two specially tailored bilateral filters — the first evaluates the illumination and the other is used for the computation of the reflectance. This result stands as a theoretic justification and refinement for the recently proposed heuristic use of the bilateral filter for retinex by Durand and Dorsey. In line with their appealing way of speeding up the bilateral filter, we show that similar speedup methods apply to our algorithm.

One of the most fundamental problems in the treatment of high-dimensional data is classification of a cloud of points in R^D into several sub-classes based on training data. An important such task is the pattern detection problem in images, which requires a separation between ‘Target’ and ‘Clutter’ classes, where every instance of a pattern in each of these classes appears as a sequence of D pixels. In most cases, the following properties hold true for the target detection task: (i) the probability of the ‘Target’ class is substantially smaller compared to that of the ‘Clutter’ ; (ii) the volume occupied by the target class in R^D is far smaller than that held by the clutter set ; and (iii) The target set is either convex or can be divided to several sub-sets each being convex.

In this talk we describe a new classifier that exploits these properties, yielding a low complexity yet effective target detection algorithm. This algorithm, called the Maximal Rejection Classifier (MRC), is based on successive rejection operations. Each such rejection stage is performed using a linear projection followed by thresholding. The projection direction is designed to maximize the number of rejected ‘Clutter’ points from further consideration. An application of detecting frontal and vertical faces in images is demonstrated using the MRC with encouraging results.

When over-complete dictionary is used, the search for the sparsest of all representations amounts to the need to solve a highly complicated combinatorial problem, with complexity growing exponenentially with the number of atoms in the dictionary. The Basis Pursuit algorithm (BP) (by Chen, Donoho, and Saundrers, 1995) was shown to be a highly effective tool for approximating the solution for the above problem.

In this talk we describe a fascinating analysis that explains the surprising empirical success of the PB. We start by presenting Donoho and Huo’s results, and then show how these results could be further improved. We first discuss the case of pair of ortho-bases and show tighter bounds on the required sparsity of the signal representation that guarantees BP-optimality. We then turn to extend previous results for arbitrary dictionaries, showing that all previous work falls as a special case of the new theory. Finally, we show how the obtained analysis results can be used to solve the problem of separating a volume of voxels into a sparse set of points, lines, and planes.

Transforming a signal is typically done in order to simplify its representation. Among the many ways to represent signals, the use of a linear combination taken from an over-complete dictionary is appealing due to its ability to cover a diversity of signal behaviors in one transform. However, with this benefit comes the problem of non-unique representation. Choosing the sparsest of all representations aligns well with our desire for simple signal description, but searching such representation becomes a hard to solve non-convex and highly non-smooth optimization problem. The “Basis-Pursuit” (BP) algorithm (Chen, Donoho, and Saunders – 1995) is a stable approximate solver for the above task, replacing the non-convex L_0 minimization with a L_1 norm. A later work by Donoho and Huo (1998) theoretically proved that for specific dictionaries built from pair of ortho-bases and for sparse enough representations, the BP algorithm is indeed optimal.

In this talk we start by describing Donoho and Huo’s work on the BP algorithm, and then show how these results could be further improved. We first discuss the case of pair of ortho-bases and show tighter bounds on the required sparsity of the signal representation that guarantees BP-optimality. We then turn to extend previous results for arbitrary dictionaries, showing that all previous work falls as a special case of the new theory. Finally, we show work in progress that relates the Basis Pursuit algorithm used as a non-linear filtering to the Bayesian approach. We show how TV, Wavelet denoising, general Bayesian priors, and specifically the bilateral filter are all special cases of the Basis Pursuit for difference choices of dictionaries.

In a wide range of applied problems of 2D and 3D imaging – including Tomography, image rotation, image registration and more – a common continuum formulation of the problem places great emphasis on obtaining and manipulating the Fourier transform in polar coordinates. However, the translation of continuum ideas into practical work with modern digital data sampled on a Cartesian grid is problematic. It is widely believed that “there is no Polar Fast Fourier Transform” and that for practical work, continuum ideas are at best a source of inspiration rather than a practical tool.

In this talk we describe a fast and highly accurate Polar-FFT. A special feature of the proposed approach is that it involves only 1D equi-spaced FFT’s. In particular, there is no role for notions like “re-gridding” and “scattered-data-interpolation”, as might have been supposed from the existing literature in related applications fields such as Tomography. A central tool in our approach is the Pseudo-Polar FFT, an FFT where the evaluation frequencies lie in an oversampled set of non-angularly equispaced points. In the talk we describe the concept of Pseudo-Polar domain (proposed originally by Averbuch, Coifman, Donoho, Israeli and Walden at 1998), including fast forward and inverse transform, a quasi-Parseval relation, and provide empirical and theoretical analysis of the Gram operator of the Pseudo-Polar FFT. For those interested primarily in Polar FFT’s, Pseudo-Polar FFT’s play the role of a halfway point – a nearly polar system from which conversion to Polar Coordinates uses processes relying purely on 1D operations. We illustrate a number of applications of our Polar and Pseudo-Polar FFT’s.

This talk discusses the problem of recovering a planar polygon from its measured moments. The moments correspond to an indicator function defined over the polygon’s support. Previous work on this problem gave necessary and sufficient conditions for such successful recovery process and focused mainly on the case of exact measurements being given. In this talk we describe an extension of these results treating the same problem in the case where a longer than necessary series of noise corrupted moments is given.

Leaning on similar problems in array processing, system identification, and signal processing, we discuss a set of possible estimation procedures which are based on the Prony and the Pencil methods, relate them one to the other, and compare them through simulations. We then present an improvement over these methods based on the direct use of the Maximum-Likelihood estimator, exploiting the above methods as initialization. Finally, we show how regularization, and thus Maximum A-posteriori Probability estimator could be applied to reflect prior knowledge about the recovered polygon.

Additive noise removal from a given signal (also known as de-noising) is an important stage in many applications in signal processing. Various approaches have been proposed throughout the years. This talk focuses on Bayesian smoothing and edge-preserving methods. Classical algorithms in this family are typically based on Weighted Least Squares (WLS), Robust Estimation (RE), and Anisotropic Diffusion (AD). These methods share common features such as adaptivity to the data, formation as optimization problems, and the need for iterative-based restoration. In 1998 Tomasi and Manduchi (CS, Stanford) proposed an alternative heuristic non-iterative filter for noise removal called the bilateral filter. It was shown to give similar and possibly better results compared to the abovementioned iterative approaches.

However, the bilateral filter was proposed as an intuitive tool without theoretical connection to the classical approaches. In this talk the various noise-removal techniques discussed here (WLS, RE, AD, and the bilateral filter) are presented and related theoretically to each other. In particular, it is shown that RE (and AD) could be interpreted as WLS with weights replaced after each iteration. Also, it is shown that the bilateral filter emerges from the Bayesian approach as a single iteration of the Jacobi iterative algorithm for a properly posed smoothness penalty. Based on this observation, it is shown how this new filter can be improved and extended to treat more general reconstruction problems.

The target detection problem is defined as the need to separate targets from clutter instances. Among the many example-based techniques for the solution of this problem, the family of rejection-based classifiers are consistently exhibiting state-of-the-art accuracy while being the fastest. This rejection-based approach advocates the use of large set of weak-classifiers chained sequentially. After application of each such atom-block, a rejection of some of the clutter is performed while guaranteeing no loss of targets.

While intuitively appealing, theoretic background for this method was gathered only recently. Some roots of it can be traced to the boosting algorithm and the decision tree methods – two wide fields of research in machine learning that concentrate on using multiple weak-classifiers for the construction of a complicated overall machine. Rejection as a concept was proposed and analyzed by Nayar and Baker, with emphasis on the multi-class problems. More recently Elad, Hel-Or, and Keshet proposed the Maximal-rejection-Classifier (MRC), and employed it to the face detection problem. To conclude this list of works on the rejection-based idea, we should mention the work of Viola and Johns on the face detection problem using sub-linear weak-classifiers joined via boosting. In this talk I’ll survey the various contributions to the rejection idea and its efficient implementation on face detection problem.

Pattern detection problems require a separation between two classes, ‘Target’ and ‘Clutter’, where the probability of the former is substantially smaller than that of the latter. We describe a new classifier that exploits this property, yielding a low complexity yet effective target detection algorithm. This Maximal Rejection Classifier (MRC) algorithm is based on successive rejection operations. Each rejection stage is performed using a linear projection followed by thresholding. The projection direction is designed to maximize the number of ‘Clutter’ points rejected from further consideration. An application of detecting frontal and vertical faces in images is demonstrated using the MRC, with encouraging results.