# Talks

## Inverse Problems

## Inverse Problems

How do we choose a network architecture in deep-learning solutions? By copying existing networks or guessing new ones, and sometimes by applying various small modifications to them via trial and error. This non-elegant and brute-force strategy has proven itself useful for a wide variety of imaging tasks. However, it comes with a painful cost – our networks tend to be quite heavy and cumbersome. Could we do better? In this talk we would like to propose a different point of view towards this important question, by advocating the following two rules: (i) Rather than “guessing” architectures, we should rely on classic signal and image processing concepts and algorithms, and turn these to networks to be learned in a supervised manner. More specifically, (ii) Sparse representation modeling is key in many (if not all) of the successful architectures that we are using. I will demonstrate these claims by presenting three recent image denoising networks that are light-weight and yet quite effective, as they follow the above guidelines.

Image denoising is the most fundamental problem in image enhancement, and it is largely solved: It has reached impressive heights in performance and quality — almost as good as it can ever get. But interestingly, it turns out that we can solve many other problems using the image denoising “engine”. I will describe the Regularization by Denoising (RED) framework: using the denoising engine in defining the regularization of any inverse problem. The idea is to define an explicit image-adaptive regularization functional directly using a high performance denoiser. Surprisingly, the resulting regularizer is guaranteed to be convex, and the overall objective functional is explicit, clear and well-defined. With complete flexibility to choose the iterative optimization procedure for minimizing this functional, RED is capable of incorporating any image denoising algorithm as a regularizer, treat general inverse problems very effectively, and is guaranteed to converge to the globally optimal result.

What if we take all the overlapping patches from a given image and organize them to create the shortest path by using their mutual Euclidean distances? This suggests a reordering of the image pixels in a way that creates a maximal 1D regularity. What could we do with such a construction? In this talk we consider a wider perspective of the above, and introduce a wavelet transform for graph-structured data. The proposed transform is based on a 1D wavelet decomposition coupled with a pre-reordering of the input so as to best sparsify the given data. We adopt this transform to image processing tasks by considering the image as a graph, where every patch is a node, and edges are obtained by Euclidean distances between corresponding patches. We show several ways to use the above ideas in practice, leading to state-of-the-art image denoising, deblurring, inpainting, and face-image compression results.

Images are 2D signals, and should be processed as such — this is the common belief in the image processing community. Is it truly the case? Around thirty years ago, some researchers suggested to convert images into 1D signals, so as to harness well-developed 1D tools such as adaptive-filtering and Kalman- estimation techniques. These attempts resulted with poorly performing algorithms, strengthening the above belief. Why should we force unnatural causality between spatially ordered pixels? Indeed, why? In this talk I will present a conversion of images into 1D signals that leads to state-of-the-art results in series of applications – denoising, inpainting, compression, and more. The core idea in our work is that there exists a permutation of the image pixels that carries in it most of the “spatial content”, and this ordering is within reach, even if the image is corrupted. We expose this permutation and use it in order to process the image as if it is a one-dimensional signal, treating successfully a series of image processing problems.

What if we take all the overlapping patches from a given image and organize them to create the shortest path by using their mutual distances? This suggests a reordering of the image pixels in a way that creates a maximal 1D regularity. What could we do with such a construction? In this talk we consider a wider perspective of the above, and introduce a wavelet transform for graph-structured data. The proposed transform is based on a 1D wavelet decomposition coupled with a pre-reordering of the input so as to best sparsify the given data. We adopt this transform to image processing tasks by considering the image as a graph, where every patch is a node, and edges are obtained by Euclidean distances between corresponding patches. We show several ways to use the above ideas in practice, leading to state-of-the-art image denoising, deblurring, and inpainting results.

What if we take all the overlapping patches from a given image and organize them to create the shortest path by using their mutual distances? This suggests a reordering of the image pixels in a way that creates a maximal 1D regularity. Could we repeat this process in several scales? What could we do with such a construction? In this talk we consider a wider perspective of the above line of questions: We introduce a wavelet transform that is meant for data organized as a connected-graph or as a cloud of high-dimensional points. The proposed transform constructs a tree that applies a 1D wavelet decomposition filters, coupled with a pre-reordering of the input, so as to best sparsify the given data. We adopt this transform to image processing tasks by considering the image as a graph, where every patch is a node, and vertices are obtained by Euclidean distances between corresponding patches. We show three ways to use the above ideas in practice – adopt only the patch-reordering, use the obtained wavelet transform as a sparsifying process, and a third approach were this transform is used as a regularizer. State-of-the-art image denoising, deblurring, and inpainting results are obtained with the proposed schemes.

What if we take all the overlapping patches from a given image and organize them to create the shortest path by using their mutual distances? This suggests a reordering of the image pixels in a way that creates a maximal 1D regularity Could we repeat this process in several “scales” ? What could we do with such a construction? In this talk we consider a wider perspective of the above line of questions: We introduce a wavelet transform that is meant for data organized as a connected-graph or as a cloud of high dimensional points. The proposed transform constructs a tree that applies a 1D wavelet decomposition filters, coupled with a pre-reordering of the input, so as to best sparsify the given data. We adopt this transform to image processing tasks by considering the image as a graph, where every patch is a node, and vertices are obtained by Euclidean distances between corresponding patches. State of- the-art image denoising results are obtained with the proposed scheme.

Multi-channel TV broadcast, Internet video and You-Tube, home DVD movies, video conference calls, cellular video calls and more – there is no doubt that videos are abundant and in everyday use. In many cases, the quality of the available video is poor, something commonly referred to as “low-resolution”. As an example, High-definition (HD) TV’s are commonly sold these days to customers that hope to enjoy a better viewing experience. Nevertheless, most TV broadcast today is still done in standard-definition (SD), leading to poor image quality on these screens. The field of Super-Resolution deals with ways to improve video content to increase optical resolution. The core idea: fusion of the visual content in several images can be performed and this can lead to a better resolution outcome. For years it has been assumed that such fusion requires knowing the exact motion the objects undergo within the scene. Since this motion may be quite complex in general, this stood as a major obstacle for industrial applications. Three years ago a break-through has been made in this field, allowing to bypass the need for exact motion estimation. In this lecture we shall survey the work in this field from its early days (25 years ago) and till very recently, and show the evolution of ideas and results obtained. No prior knowledge in image processing is required.

This course (4 lectures and one tutorial) brings the core ideas and achievements made in the field of sparse and redundant representation modeling, with emphasis on the impact of this field to image processing applications. The five lectures (given as PPTX and PDF) are organized as follows:

Lecture 1: The core sparse approximation problem and pursuit algorithms that aim to approximate its solution.

Lecture 2: The theory on the uniqueness of the sparsest solution of a linear system, the notion of stability for the noisy case, guarantees for the performance of pursuit algorithms using the mutual coherence and the RIP.

Lecture 3: Signal (and image) models and their importance, the Sparseland model and its use, analysis versus synthesis modeling, a Bayesian estimation point of view, dictionary learning with the MOD and the K-SVD, global and local image denoising, local image inpainting.

Lecture 4: Sparse representations in image processing – image deblurring, global image separation and image inpainting. using dictionary learning for image and video denoising and inpainting, image scale-up using a pair of learned dictionaries, facial image compression with the K-SVD.

This course (5 lectures) brings the core ideas and achievements made in the field of sparse and redundant representation modeling, with emphasis on the impact of this field to image processing applications. The five lectures (given as PPTX and PDF) are organized as follows:

Lecture 1: The core sparse approximation problem and pursuit algorithms that aim to approximate its solution.

Lecture 2: The theory on the uniqueness of the sparsest solution of a linear system, the notion of stability for the noisy case, guarantees for the performance of pursuit algorithms using the mutual coherence and the RIP.

Lecture 3: Signal (and image) models and their importance, the Sparseland model and its use, analysis versus synthesis modeling, a Bayesian estimation point of view.

Lecture 4: First steps in image processing with the Sparseland model – image deblurring, image denoising, image separation, and image inpainting. Global versus local processing of images. Dictionary learning with the MOD and the K-SVD.

Lecture 5: Advanced image processing: Using dictionary learning for image and video denoising and inpainting, image scale-up using a pair of learned dictionaries, Facial image compression with the K-SVD.

This survey talk focuses on the use of sparse and redundant representations and learned dictionaries for image denoising and other related problems. We discuss the the K-SVD algorithm for learning a dictionary that describes the image content efficiently. We then show how to harness this algorithm for image denoising, by working on small patches and forcing sparsity over the trained dictionary. The above is extended to color image denoising and inpainting, video denoising, and facial image compression, leading in all these cases to state of the art results. We conclude with more recent results on the use of several sparse representations for getting better denoising performance. An algorithm to generate such set of representations is developed, and our analysis shows that by this we approximate the minimum-mean-squared-error (MMSE) estimator, thus getting better results.

Modeling of signals or images by a sparse and redundant representation is shown in recent years to be very effective, often leading to stat-of-the-art results in many applications. Applications leaning on this model can be cast as energy minimization problems, where the unknown is a high-dimensional and very sparse vector. Surprisingly, traditional tools in optimization, including very recently developed interior-point algorithms, tend to perform very poorly on these problems. A recently emerging alternative is a family of techniques, known as “iterated-shrinkage” methods. There are various and different such algorithms, but common to them all is the fact that each of their iterations require a simple forward and inverse transform (e.g. wavelet), and a scalar shrinkage look-up-table (LUT) step. In this talk we shall explain the need for such algorithms, present some of them, and show how they perform on a classic image deblurring problem.

The super-resolution reconstruction problem addresses the fusion of several low quality images into one higher-resolution outcome. A typical scenario for such a process could be the fusion of several video fields into a higher resolution output that can lead to high quality printout. The super-resolution result provides TRUE resolution, as opposed to the typically used interpolation techniques. The core idea behind this ability is the fact that higher-frequencies exist in the measurements, although in an aliased form, and those can be recovered due to the motion between the frames. Ever since the pioneering work by Tsai and Huang (1984), who demonstrated the core ability to get super-resolution, much work has been devoted by various research groups to this problem and ways to solve it. In this talk I intend to present the core ideas behind the super-resolution (SR) problem, and our very recent results in this field. Starting form the problem modeling, and posing the super-resolution task as a general inverse problem interpretation, we shall see how the SR problem can be addressed effectively using ML and later MAP estimation methods. This talk also show various ingredients that are added to the reconstruction process to make it robust and efficient. Many results will accompany these descriptions, so as to show the strengths of the methods.

Recently, three independent works suggested iterated shrinkage algorithms that generalizes the classic work by Donoho and Johnston. Daubechies, Defrise and De-Mol developed such an algorithm for deblurring, using the concept of surrogate functions. Figueirido and Nowak used the EM algorithm to construct such algorithm, and later extended their work by using the bound-optimization technique. Elad developed such an algorithm based on a parallel coordinate descent (PCD) point of view. In this talk we describe these methods with an emphasis on the later, and demonstrate how it can be of importance for the minimization of a general basis pursuit penalty function. As such, the proposed algorithms form a new pursuit technique, falling in between the basis pursuit and the matching pursuit.

Shrinkage is a well known and appealing denoising technique. The use of shrinkage is known to be optimal for Gaussian white noise, provided that the sparsity on the signal’s representation is enforced using a unitary transform. Still, shrinkage is also practiced successfully with non-unitary, and even redundant representations. In this lecture we shed some light on this behavior. We show that simple shrinkage could be interpreted as the first iteration of an algorithm that solves the basis pursuit denoising (BPDN) problem. Thus, this work leads to a sequential shrinkage algorithm that can be considered as a novel and effective pursuit method. We demonstrate this algorithm, both synthetically, and for the image denoising problem, showing in both cases its superiority over several popular alternatives..

In our field, when composing a super-resolved image, two ingredients contribute to the ability to get a leap in resolution: (i) the existence of many and diverse measurements, and (ii) the availability of a model to reliably describe the image to be produced. This second part, also known as the regularization or the prior, is of generic importance, and could be deployed to any inverse problem, and used by many other applications (compression, image synthesis, and more). The well-known recent work by Baker and Kanade (’02) and the work that followed (Lin and Shum ’04, Robinson and Milanfar ’05) all suggest that while the measurements are limited in gaining a resolution increase, the prior could be used to break this barrier. Clearly, the better the prior used, the higher the quality we can expect from the overall reconstruction procedure. Indeed, recent work on super-resolution (and other inverse problems) departs from the regular Tikhonov method, and tends to the robust counterparts, such as TV or the bilateral prior (see Farsiu et. al. ’04).

A recent trend with a growing popularity is the use of examples in defining the prior. Indeed, Baker and Kanade were the first to introduce this notion to the super-resolution task. There are several ways to use examples in shaping the prior to become better. The work by Mumford and Zhu (’99) and the follow-up contribution by Haber and Tenorio (02′) suggest a parametric approach. Baker and Kanade (’02), Freeman et. al. (several contributions ’01), Nakagaki and Katzaggelos (’03) all use the examples to directly learn the reconstruction function, by observing low-res. versus high-res. pairs.

In this talk we survey this line of work and show how it can be extended in several important ways. We show a general framework that builds an example-based prior that is independent of the inverse problem at hand, and we demonstrate it on several such problems, with promising results.

Retinex theory deals with the removal of unfavorable illumination effects from images. This ill-posed inverse problem is typically regularized by forcing spatial smoothness on the recoverable illumination. Recent work in this field suggested exploiting the knowledge that the illumination image bounds the image from above, and the fact that the reflectance is also expected to be smooth.

In this lecture we show how the above model can be improved to provide a non-iterative retinex algorithm that handles better edges in the illumination, and suppresses noise in dark areas. This algorithm uses two specially tailored bilateral filters — the first evaluates the illumination and the other is used for the computation of the reflectance. This result stands as a theoretic justification and refinement for the recently proposed heuristic use of the bilateral filter for retinex by Durand and Dorsey. In line with their appealing way of speeding up the bilateral filter, we show that similar speedup methods apply to our algorithm.

In this talk we present a novel method for separating images into texture and piecewise smooth parts, and show how this formulation can also lead to image inpainting. Our separation and inpainting process exploits both the variational and the sparsity mechanisms, by combining the Basis Pursuit Denoising (BPDN) algorithm and the Total-Variation (TV) regularization scheme.

The basic idea in this work is the use of two appropriate dictionaries, one for the representation of textures, and the other for the natural scene parts, assumed to be piece-wise-smooth. Both dictionaries are chosen such that they lead to sparse representations over one type of image-content (either texture or piecewise smooth). The use of the BPDN with the two augmented dictionaries leads to the desired separation, along with noise removal as a by-product. As the need to choose a proper dictionary for natural scene is very hard, a TV regularization is employed to better direct the separation process.

This concept of separation via sparse and over-complete representation of the image is shown to have a direct and natural extension to image inpainting. When some of the pixels in known locations in the image are missing, the same separation formulation can be changed to fit the problem of decomposing the image while filling in the holes. Thus, as a by-product of the separation we achieve inpainting. This approach should be compared to a recently published inpainting system by Bertalmio, Vese, Sapiro, and Osher. We will present several experimental results that validate the algorithm’s performance.

The separation of image content into semantic parts plays a vital role in applications such as compression, enhancement, restoration, and more. In recent years several pioneering works suggested such separation based on variational formulation, and others using independent component analysis and sparsity. In this talk we present a novel method for separating images into texture and piecewise smooth parts, exploiting both the variational and the sparsity mechanisms, by combining the Basis Pursuit Denoising (BPDN) algorithm and the Total-Variation (TV) regularization scheme.

The basic idea in our work is the use of two appropriate dictionaries, one for the representation of textures, and the other for the natural scene parts, assumed to be piece-wise-smooth. Both dictionaries are chosen such that they lead to sparse representations over one type of image-content (either texture or piecewise smooth). The use of the BPDN with the two augmented dictionaries leads to the desired separation, along with noise removal as a by-product. As the need to choose a proper dictionary for natural scene is very hard, a TV regularization is employed to better direct the separation process. We will present several experimental results that validate the algorithm’s performance.

This talk discusses the problem of recovering a planar polygon from its measured moments. The moments correspond to an indicator function defined over the polygon’s support. Previous work on this problem gave necessary and sufficient conditions for such successful recovery process and focused mainly on the case of exact measurements being given. In this talk we describe an extension of these results treating the same problem in the case where a longer than necessary series of noise corrupted moments is given.

Leaning on similar problems in array processing, system identification, and signal processing, we discuss a set of possible estimation procedures which are based on the Prony and the Pencil methods, relate them one to the other, and compare them through simulations. We then present an improvement over these methods based on the direct use of the Maximum-Likelihood estimator, exploiting the above methods as initialization. Finally, we show how regularization, and thus Maximum A-posteriori Probability estimator could be applied to reflect prior knowledge about the recovered polygon.

The super-resolution reconstruction process deals with the fusion of several low quality and low-resolution images into one higher-resolution and possibly better final image. We start by showing that from theoretic point of view, this fusion process is based on generalized sampling theorems due to Yen (1956) and Papulis (1977). When more realistic scenario is considered with blur, arbitrary motion, and additive noise, an estimation approach is considered instead.

We describe methods based on the Maximum-Likelihood (ML), Maximum-A-posteriori Probability (MAP), and the Projection onto Convex Sets POCS) as candidate tools to use. Underlying all these methods is the development of a model describing the relation between the measurements (low-quality images) and the desired output (high-resolution image). Through this path we presents the basic rational behind super-resolution, and then present the dichotomy between the static and the dynamic super-resolution process. We proposed treatment of both, and deal with several interesting special cases.