# Talks

## Image Processing Applications

## Image Processing Applications

How do we choose a network architecture in deep-learning solutions? By copying existing networks or guessing new ones, and sometimes by applying various small modifications to them via trial and error. This non-elegant and brute-force strategy has proven itself useful for a wide variety of imaging tasks. However, it comes with a painful cost – our networks tend to be quite heavy and cumbersome. Could we do better? In this talk we would like to propose a different point of view towards this important question, by advocating the following two rules: (i) Rather than “guessing” architectures, we should rely on classic signal and image processing concepts and algorithms, and turn these to networks to be learned in a supervised manner. More specifically, (ii) Sparse representation modeling is key in many (if not all) of the successful architectures that we are using. I will demonstrate these claims by presenting three recent image denoising networks that are light-weight and yet quite effective, as they follow the above guidelines.

פירוק אטומי, הטבלה המחזורית, הרכבה של מולקולה … כל זה נשמע כמו תחילתה של הרצאה בכימיה. אבל לא! נושאים אלו יעלו בהרצאה שתדון בעיבוד תמונות. עיבוד תמונות הינו תחום מרכזי בחיינו – מהטלוויזיה בבתינו, המצלמה הדיגיטאלית שבכיסנו (ולאחרונה כחלק מהטלפון הסלולרי), דרך צפייה בסרטי די-וי-די, בקרת איכות בפסי ייצור, מערכות עקיבה ואבטחה, ועד צילומי אולטראסאונד, טומוגרפיה ותהודה מגנטית בבתי-חולים. בכל אלה ובעוד מוצרים רבים, עיבוד תמונות מהווה טכנולוגיה שאי-אפשר בלעדיה. תחום זה תוסס ופורח הן בתעשיה והן באקדמיה, עם עשרות אלפי מהנדסים ומדענים בכל רחבי העולם העוסקים בו יום יום ושעה שעה. אז מה זה עיבוד תמונות? זהו הנושא שבו נדון בהרצאה זו. עיבוד תמונות מתייחס לטיפול בתמונות ע”י מחשב. אנו נעסוק בשאלות כגון כיצד תמונה עושה את דרכה אל המחשב, כיצד היא אגורה שם, מה ניתן לעשות בה משכבר היא שם, ועוד. אחת המטרות המרכזיות בהרצאה זו היא הצגת חזית הידע בתחום, ובפרט העשייה המחקרית בטכניון בזירה זו. הדיון הכללי הנ”ל על עיבוד תמונות ישמש אותנו כמצע עליו נבנה כדי להציג את עבודתנו מהעת האחרונה, בה אנו עוסקים במודלים לתמונות המשתמשים ברעיונות “כימיקליים”, בעזרתם אנו מטפלים במידע בכלל ובתמונות בפרט. אנו נתאר כיצד ניתן לפרק תמונה לאטומים, לבנות טבלה מחזורית של יסודות לתיאור תמונות, וכיצד אנו רותמים את כל אלה כדי לפתור בעיות מעשיות בתחום, כגון שיפור תמונות וסרטים ותיקונם מקלקולים שונים, השלמת חלקים חסרים בתמונות, דחיסה, ועוד

Style-transfer is a process of migrating a style from a given image to the content of another, synthesizing a new image which is an artistic mixture of the two. Recent work on this problem adopting Convolutional Neural-networks (CNN) ignited a renewed interest in this field, due to the very impressive results obtained. There exists an alternative path towards handling the style-transfer task, via generalization of texture-synthesis algorithms. I will present a novel such style-transfer algorithm that extends the texture-synthesis work of Kwatra et. al. (2005), while aiming to get stylized images that get closer in quality to the CNN ones.

Image Processing is a fascinating scientific field, offering ways to handle visual data by computers. How can an image be brought to be stored and processed by a computer? What kind of such processing could be done which are worthwhile? In the first part of this talk we shall describe the core ideas behind the field of image processing by answering these two questions. In the second part of the talk we shall turn to describe the recent research activity in Elad’s group in the Computer-Science department at the Technion, emphasizing the vast work done on harnessing sparse and redundant representation modeling to image processing needs.

Compression of frontal facial images is an appealing and important application. Recent work has shown that specially tailored algorithms for this task can lead to performance far exceeding JPEG2000. This paper proposes a novel such compression algorithm, exploiting our recently developed redundant tree-based wavelet transform. Originally meant for functions defined on graphs and cloud of points, this new transform has been shown to be highly effective as an image adaptive redundant and multi-scale decomposition. The key concept behind this method is reordering of the image pixels so as to form a highly smooth 1D signal that can be sparsified by a regular wavelet. In this work we bring this image adaptive transform to the realm of compression of aligned frontal facial images. Given a training set of such images, the transform is designed to best sparsify the whole set using a common feature-ordering. Our compression scheme consists of sparse coding using the transform, followed by entropy coding of the obtained coefficients. The inverse transform and a post-processing stage are used to decode the compressed image. We demonstrate the performance of the proposed scheme and compare it to other competing algorithms.

Multi-channel TV broadcast, Internet video and You-Tube, home DVD movies, video conference calls, cellular video calls and more – there is no doubt that videos are abundant and in everyday use. In many cases, the quality of the available video is poor, something commonly referred to as “low-resolution”. As an example, High-definition (HD) TV’s are commonly sold these days to customers that hope to enjoy a better viewing experience. Nevertheless, most TV broadcast today is still done in standard-definition (SD), leading to poor image quality on these screens. The field of Super-Resolution deals with ways to improve video content to increase optical resolution. The core idea: fusion of the visual content in several images can be performed and this can lead to a better resolution outcome. For years it has been assumed that such fusion requires knowing the exact motion the objects undergo within the scene. Since this motion may be quite complex in general, this stood as a major obstacle for industrial applications. Three years ago a break-through has been made in this field, allowing to bypass the need for exact motion estimation. In this lecture we shall survey the work in this field from its early days (25 years ago) and till very recently, and show the evolution of ideas and results obtained. No prior knowledge in image processing is required.

This course (4 lectures and one tutorial) brings the core ideas and achievements made in the field of sparse and redundant representation modeling, with emphasis on the impact of this field to image processing applications. The five lectures (given as PPTX and PDF) are organized as follows:

Lecture 1: The core sparse approximation problem and pursuit algorithms that aim to approximate its solution.

Lecture 2: The theory on the uniqueness of the sparsest solution of a linear system, the notion of stability for the noisy case, guarantees for the performance of pursuit algorithms using the mutual coherence and the RIP.

Lecture 3: Signal (and image) models and their importance, the Sparseland model and its use, analysis versus synthesis modeling, a Bayesian estimation point of view, dictionary learning with the MOD and the K-SVD, global and local image denoising, local image inpainting.

Lecture 4: Sparse representations in image processing – image deblurring, global image separation and image inpainting. using dictionary learning for image and video denoising and inpainting, image scale-up using a pair of learned dictionaries, facial image compression with the K-SVD.

This course (5 lectures) brings the core ideas and achievements made in the field of sparse and redundant representation modeling, with emphasis on the impact of this field to image processing applications. The five lectures (given as PPTX and PDF) are organized as follows:

Lecture 1: The core sparse approximation problem and pursuit algorithms that aim to approximate its solution.

Lecture 2: The theory on the uniqueness of the sparsest solution of a linear system, the notion of stability for the noisy case, guarantees for the performance of pursuit algorithms using the mutual coherence and the RIP.

Lecture 3: Signal (and image) models and their importance, the Sparseland model and its use, analysis versus synthesis modeling, a Bayesian estimation point of view.

Lecture 4: First steps in image processing with the Sparseland model – image deblurring, image denoising, image separation, and image inpainting. Global versus local processing of images. Dictionary learning with the MOD and the K-SVD.

Lecture 5: Advanced image processing: Using dictionary learning for image and video denoising and inpainting, image scale-up using a pair of learned dictionaries, Facial image compression with the K-SVD.

This survey talk focuses on the use of sparse and redundant representations and learned dictionaries for image denoising and other related problems. We discuss the the K-SVD algorithm for learning a dictionary that describes the image content efficiently. We then show how to harness this algorithm for image denoising, by working on small patches and forcing sparsity over the trained dictionary. The above is extended to color image denoising and inpainting, video denoising, and facial image compression, leading in all these cases to state of the art results. We conclude with more recent results on the use of several sparse representations for getting better denoising performance. An algorithm to generate such set of representations is developed, and our analysis shows that by this we approximate the minimum-mean-squared-error (MMSE) estimator, thus getting better results.

Scaling up a single image while preserving is sharpness and visual-quality is a difficult and highly ill-posed inverse problem. A series of algorithms have been proposed over the years for its solution, with varying degrees of success. In CVPR 2008, Yang, Wright, Huang and Ma proposed a solution to this problem based on sparse representation modeling and dictionary learning. In this talk I present a variant of their method with several important differences. In particular, the proposed algorithm does not need a separate training phase, as the dictionaries are learned directly from the image to be scaled-up. Furthermore, the high-resolution dictionary is learned differently, by forcing its alignment with the low-resolution one. We show the benefit these modifications bring in terms of simplicity of the overall algorithm, and its output quality.

פירוק אטומי, הטבלה המחזורית, הרכבה של מולקולה … כל זה נשמע כמו תחילתה של הרצאה בכימיה. אבל לא! נושאים אלו יעלו בהרצאה שתדון בעיבוד תמונות. עיבוד תמונות הינו תחום מרכזי בחיינו – מהטלוויזיה בבתינו, המצלמה הדיגיטאלית שבכיסנו (ולאחרונה כחלק מהטלפון הסלולרי), דרך צפייה בסרטי די-וי-די, בקרת איכות בפסי ייצור, מערכות עקיבה ואבטחה, ועד צילומי אולטראסאונד, טומוגרפיה ותהודה מגנטית בבתי-חולים. בכל אלה ובעוד מוצרים רבים, עיבוד תמונות מהווה טכנולוגיה שאי-אפשר בלעדיה. תחום זה תוסס ופורח הן בתעשיה והן באקדמיה, עם עשרות אלפי מהנדסים ומדענים בכל רחבי העולם העוסקים בו יום יום ושעה שעה. אז מה זה עיבוד תמונות? זהו הנושא שבו נדון בהרצאה זו. עיבוד תמונות מתייחס לטיפול בתמונות ע”י מחשב. אנו נעסוק בשאלות כגון כיצד תמונה עושה את דרכה אל המחשב, כיצד היא אגורה שם, מה ניתן לעשות בה משכבר היא שם, ועוד. אחת המטרות המרכזיות בהרצאה זו היא הצגת חזית הידע בתחום, ובפרט העשייה המחקרית בטכניון בזירה זו. הדיון הכללי הנ”ל על עיבוד תמונות ישמש אותנו כמצע עליו נבנה כדי להציג את עבודתנו מהעת האחרונה, בה אנו עוסקים במודלים לתמונות המשתמשים ברעיונות “כימיקליים”, בעזרתם אנו מטפלים במידע בכלל ובתמונות בפרט. אנו נתאר כיצד ניתן לפרק תמונה לאטומים, לבנות טבלה מחזורית של יסודות לתיאור תמונות, וכיצד אנו רותמים את כל אלה כדי לפתור בעיות מעשיות בתחום, כגון שיפור תמונות וסרטים ותיקונם מקלקולים שונים, השלמת חלקים חסרים בתמונות, דחיסה, ועוד

In this talk we describe applications such as image denoising and beyond using sparse and redundant representations. Our focus is on ways to perform these tasks with trained dictionaries using the K-SVD algorithm. As trained dictionaries are limited in handling small image patches, we deploy these within a Bayesian reconstruction procedure by forming an image prior that forces every patch in the resulting image to have a sparse representation.

Super-resolution reconstruction proposes a fusion of several low quality images into one higher quality result with better optical resolution. Classic super resolution techniques strongly rely on the availability of accurate motion estimation for this fusion task. When the motion is estimated inaccurately, as often happens for non-global motion fields, annoying artifacts appear in the super-resolved outcome. Encouraged by recent developments on the video denoising problem, where state-of-the-art algorithms are formed with no explicit motion estimation, we seek a super-resolution algorithm of similar nature that will allow processing sequences with general motion patterns. In this talk we base our solution on the Non-Local-Means (NLM) algorithm. We show how this denoising method is generalized to become a relatively simple super-resolution algorithm with no explicit motion estimation. Results on several test movies show that the proposed method is very successful in providing super-resolution on general sequences.

In this survey talk we focus on the use of sparse and redundant representations and learned dictionaries for image denoising and other related problems. We discuss the the K-SVD algorithm for learning a dictionary that describes the image content effectively. We then show how to harness this algorithm for image denoising, by working on small patches and forcing sparsity over the trained dictionary. The above is extended to color image denoising and inpainitng, video denoising, and facial image compression, leading in all these cases to state of the art results. We conclude with very recent results on the use of several sparse representations for getting better denoising performance. An algorithm to generate such set of representations is developed, and our analysis shows that by this method we approximate the minimum-mean-squared-error (MMSE) estimator, thus getting better results.

Modeling of signals or images by a sparse and redundant representation is shown in recent years to be very effective, often leading to stat-of-the-art results in many applications. Applications leaning on this model can be cast as energy minimization problems, where the unknown is a high-dimensional and very sparse vector. Surprisingly, traditional tools in optimization, including very recently developed interior-point algorithms, tend to perform very poorly on these problems. A recently emerging alternative is a family of techniques, known as “iterated-shrinkage” methods. There are various and different such algorithms, but common to them all is the fact that each of their iterations require a simple forward and inverse transform (e.g. wavelet), and a scalar shrinkage look-up-table (LUT) step. In this talk we shall explain the need for such algorithms, present some of them, and show how they perform on a classic image deblurring problem.

In this very brief talk I describe the need to model images in general, and then briefly present the Sparse-Land model. The talk includes a demonstration of a sequence of applications in image processing where this model has been deployed successfully, including denoising of still, color and video images, inpainting, and compression. The moral to take home is: “The Sparse-Land model is a new and promising model that can adapt to many types of data sources. Its potential for medical imaging is an important opportunity that should be explored”.

In this very brief talk I describe the need to model images in general, and then briefly present the Sparse-Land model. The talk includes a demonstration of a sequence of applications in image processing where this model has been deployed successfully, including denoising of still, color and video images, inpainting, and compression. The moral to take home is: “The Sparse-Land model is a new and promising model that can adapt to many types of data sources. Its potential for medical imaging is an important opportunity that should be explored”.

The super-resolution reconstruction problem addresses the fusion of several low quality images into one higher-resolution outcome. A typical scenario for such a process could be the fusion of several video fields into a higher resolution output that can lead to high quality printout. The super-resolution result provides TRUE resolution, as opposed to the typically used interpolation techniques. The core idea behind this ability is the fact that higher-frequencies exist in the measurements, although in an aliased form, and those can be recovered due to the motion between the frames. Ever since the pioneering work by Tsai and Huang (1984), who demonstrated the core ability to get super-resolution, much work has been devoted by various research groups to this problem and ways to solve it. In this talk I intend to present the core ideas behind the super-resolution (SR) problem, and our very recent results in this field. Starting form the problem modeling, and posing the super-resolution task as a general inverse problem interpretation, we shall see how the SR problem can be addressed effectively using ML and later MAP estimation methods. This talk also show various ingredients that are added to the reconstruction process to make it robust and efficient. Many results will accompany these descriptions, so as to show the strengths of the methods.

In signal and image processing, we often use transforms in order to simplify operations or to enable better treatment to the given data. A recent trend in these fields is the use of over complete linear transforms that lead to a sparse description of signals. This new breed of methods is more difficult to use, often requiring more computations. Still, they are much more effective in applications such as signal compression and inverse problems. In fact, much of the success attributed to the wavelet transform in recent years, is directly related to the above-mentioned trend. In this talk we will present a survey of this recent path of research, and its main results. We will discuss both the theoretic and the application sides to this field. No previous knowledge is assumed (… just common sense, and little bit of linear algebra).

We address the image denoising problem, where zero mean white and homogeneous Gaussian additive noise should be removed from a given image. The approach taken is based on sparse and redundant representations over a trained dictionary. The proposed algorithm denoises the image, while simultaneously training a dictionary on its (corrupted) content using the K-SVD algorithm. As the dictionary training algorithm is limited in handling small image patches, we extend its deployment to arbitrary image sizes by defining a global image prior that forces sparsity over patches in every location in the image. We show how such Bayesian treatment leads to a simple and effective denoising algorithm, with state-of-the-art performance, equivalent and sometimes surpassing recently published leading alternative denoising methods.

In this talk we present a novel method for separating images into texture and piece-wise smooth parts, and show how this formulation can also lead to image inpainting. Our separation and inpainting processes are based on sparse and redundant representations of the two contents – cartoon and texture – over different dictionaries. Using the Basis Pursuit Denoising (BPDN) to formulate the overall penalty function, we achieve a separation of the image, denoising, and inpainting. In fact, with a small modification, the damn thing can make coffee.

Retinex theory deals with the removal of unfavorable illumination effects from images. This ill-posed inverse problem is typically regularized by forcing spatial smoothness on the recoverable illumination. Recent work in this field suggested exploiting the knowledge that the illumination image bounds the image from above, and the fact that the reflectance is also expected to be smooth.

In this lecture we show how the above model can be improved to provide a non-iterative retinex algorithm that handles better edges in the illumination, and suppresses noise in dark areas. This algorithm uses two specially tailored bilateral filters — the first evaluates the illumination and the other is used for the computation of the reflectance. This result stands as a theoretic justification and refinement for the recently proposed heuristic use of the bilateral filter for retinex by Durand and Dorsey. In line with their appealing way of speeding up the bilateral filter, we show that similar speedup methods apply to our algorithm.

Transforming signals is typically done in order to simplify their representations. Among the many ways to do so, the use of linear combinations taken over a redundant dictionary is appealing due to both its simplicity and its diversity. Choosing the sparsest of all solutions aligns well with our desire for a simple signal description, and this also leads to uniqueness. Since the search for the sparsest representation is NP-hard, methods such as the Basis-Pursuit (BP) and the Matching Pursuit (MP) have been proposed in the mid 90’s to approximate the desired sparse solution.

The pioneering work by Donoho and Huo (’99) started a sequence of research efforts, all aiming to theoretically understand the quality of approximations obtained by the pursuit algorithms, and the limits to their success. A careful study established that both BP and MP algorithms are expected to lead to the sparsest of all representations if indeed such solution is sparse enough. Later work generalized these results to the case where error is allowed in the representation. Very recent results addressed the same analysis from a probabilistic point of view, finding bounds on the average performance, and showing a close resemblance to empirical evidence.

All these results lead to the ability to use the pursuit algorithms with clear understanding of their expected behavior, in what Stanley Osher would have called “emotionally uninvolved” manner. This paves the way for future transforms that will be based on (i) overcomplete (redundant) representations, (ii) linear in constructing signals, and non-linear in their decomposition, and (iii) sparsity as their core force. Furthermore, as signal transforms, signal compression, and inverse problems, are all tangled together, we are now armed with new and effective tools when addressing many problems in signal and image processing.

In this talk we present a survey of this recent path of research, its main results, and the involved players and their contributions. We will discuss both the theoretic and the application sides to this field. No previous knowledge is assumed.

In this talk we present a novel method for separating images into texture and piecewise smooth parts, and show how this formulation can also lead to image inpainting. Our separation and inpainting process exploits both the variational and the sparsity mechanisms, by combining the Basis Pursuit Denoising (BPDN) algorithm and the Total-Variation (TV) regularization scheme.

The basic idea in this work is the use of two appropriate dictionaries, one for the representation of textures, and the other for the natural scene parts, assumed to be piece-wise-smooth. Both dictionaries are chosen such that they lead to sparse representations over one type of image-content (either texture or piecewise smooth). The use of the BPDN with the two augmented dictionaries leads to the desired separation, along with noise removal as a by-product. As the need to choose a proper dictionary for natural scene is very hard, a TV regularization is employed to better direct the separation process.

This concept of separation via sparse and over-complete representation of the image is shown to have a direct and natural extension to image inpainting. When some of the pixels in known locations in the image are missing, the same separation formulation can be changed to fit the problem of decomposing the image while filling in the holes. Thus, as a by-product of the separation we achieve inpainting. This approach should be compared to a recently published inpainting system by Bertalmio, Vese, Sapiro, and Osher. We will present several experimental results that validate the algorithm’s performance.

The separation of image content into semantic parts plays a vital role in applications such as compression, enhancement, restoration, and more. In recent years several pioneering works suggested such separation based on variational formulation, and others using independent component analysis and sparsity. In this talk we present a novel method for separating images into texture and piecewise smooth parts, exploiting both the variational and the sparsity mechanisms, by combining the Basis Pursuit Denoising (BPDN) algorithm and the Total-Variation (TV) regularization scheme.

The basic idea in our work is the use of two appropriate dictionaries, one for the representation of textures, and the other for the natural scene parts, assumed to be piece-wise-smooth. Both dictionaries are chosen such that they lead to sparse representations over one type of image-content (either texture or piecewise smooth). The use of the BPDN with the two augmented dictionaries leads to the desired separation, along with noise removal as a by-product. As the need to choose a proper dictionary for natural scene is very hard, a TV regularization is employed to better direct the separation process. We will present several experimental results that validate the algorithm’s performance.

One of the most fundamental problems in the treatment of high-dimensional data is classification of a cloud of points in R^D into several sub-classes based on training data. An important such task is the pattern detection problem in images, which requires a separation between ‘Target’ and ‘Clutter’ classes, where every instance of a pattern in each of these classes appears as a sequence of D pixels. In most cases, the following properties hold true for the target detection task: (i) the probability of the ‘Target’ class is substantially smaller compared to that of the ‘Clutter’ ; (ii) the volume occupied by the target class in R^D is far smaller than that held by the clutter set ; and (iii) The target set is either convex or can be divided to several sub-sets each being convex.

In this talk we describe a new classifier that exploits these properties, yielding a low complexity yet effective target detection algorithm. This algorithm, called the Maximal Rejection Classifier (MRC), is based on successive rejection operations. Each such rejection stage is performed using a linear projection followed by thresholding. The projection direction is designed to maximize the number of rejected ‘Clutter’ points from further consideration. An application of detecting frontal and vertical faces in images is demonstrated using the MRC with encouraging results.

The super-resolution reconstruction process deals with the fusion of several low quality and low-resolution images into one higher-resolution and possibly better final image. We start by showing that from theoretic point of view, this fusion process is based on generalized sampling theorems due to Yen (1956) and Papulis (1977). When more realistic scenario is considered with blur, arbitrary motion, and additive noise, an estimation approach is considered instead.

We describe methods based on the Maximum-Likelihood (ML), Maximum-A-posteriori Probability (MAP), and the Projection onto Convex Sets POCS) as candidate tools to use. Underlying all these methods is the development of a model describing the relation between the measurements (low-quality images) and the desired output (high-resolution image). Through this path we presents the basic rational behind super-resolution, and then present the dichotomy between the static and the dynamic super-resolution process. We proposed treatment of both, and deal with several interesting special cases.

The target detection problem is defined as the need to separate targets from clutter instances. Among the many example-based techniques for the solution of this problem, the family of rejection-based classifiers are consistently exhibiting state-of-the-art accuracy while being the fastest. This rejection-based approach advocates the use of large set of weak-classifiers chained sequentially. After application of each such atom-block, a rejection of some of the clutter is performed while guaranteeing no loss of targets.

While intuitively appealing, theoretic background for this method was gathered only recently. Some roots of it can be traced to the boosting algorithm and the decision tree methods – two wide fields of research in machine learning that concentrate on using multiple weak-classifiers for the construction of a complicated overall machine. Rejection as a concept was proposed and analyzed by Nayar and Baker, with emphasis on the multi-class problems. More recently Elad, Hel-Or, and Keshet proposed the Maximal-rejection-Classifier (MRC), and employed it to the face detection problem. To conclude this list of works on the rejection-based idea, we should mention the work of Viola and Johns on the face detection problem using sub-linear weak-classifiers joined via boosting. In this talk I’ll survey the various contributions to the rejection idea and its efficient implementation on face detection problem.

Pattern detection problems require a separation between two classes, ‘Target’ and ‘Clutter’, where the probability of the former is substantially smaller than that of the latter. We describe a new classifier that exploits this property, yielding a low complexity yet effective target detection algorithm. This Maximal Rejection Classifier (MRC) algorithm is based on successive rejection operations. Each rejection stage is performed using a linear projection followed by thresholding. The projection direction is designed to maximize the number of ‘Clutter’ points rejected from further consideration. An application of detecting frontal and vertical faces in images is demonstrated using the MRC, with encouraging results.