Talks
2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2023
למידה עמוקה (deep learning) הוא תחום שישנה את חיינו – תיקון – הוא כבר משנה את חיינו. בהרצאה זו שניתנה לסטודנטים בטכניון ולמתעניינים אחרים אני מספר את סיפורו המרתק של התחום הזה ואת התהפוכות שהוא עבר בששים השנים האחרונות. הרצאה זו נועדה לקהל הרחב ולא דורשת כל ידע מוקדם.
קישור זה מפנה לסרט וידאו של הרצאה זו שהוקלט באולפן.
קישור זה מפנה לסרט וידאו של הקלטת מפגש הזום (איכות האודיו נומוכה יותר) בו העברתי את אותה הרצאה.
Image denoising is one of the oldest and most studied problems in image processing. An extensive work over several decades has led to thousands of papers on this subject, and to many well-performing algorithms for this task. As expected, the era of deep learning has brought yet another revolution to this subfield, and took the lead in today’s ability for noise suppression in images. This talk focuses on recently discovered abilities and opportunities of image denoisers. We expose the possibility of using image denoisers for serving other problems, such as regularizing general inverse problems and serving as the engine for image synthesis. We also unveil the (strange?) idea that denoising and other inverse problems might not have a unique solution, as common algorithms would have you believe. Instead, we describe constructive ways to produce randomized and diverse high perceptual quality results for inverse problems.
This is an invited talk, given in the “A Multiscale tour of Harmonic Analysis and Machine Learning” event, on April 19-21, Celebrating Stéphane Mallat’s 60th birthday.
Recordings of all the talks in this event can be found here.
2022
Part A: Image denoising – removal of white additive Gaussian noise from an image – is one of the oldest and most studied problems in image processing. An extensive work over several decades has led to thousands of papers on this subject, and to many well-performing algorithms for this task. As expected, the era of deep learning has brought yet another revolution to this subfield, and took the lead in today’s ability for noise suppression in images. All this progress has led some researchers to believe that “Denoising Is Dead”, in the sense that all that can be achieved is already done. Part A of this talk we will introduce the above evolution of this field, and highlight the tension that exists between classical approaches and modern AI alternatives.
Part B: Part B of this talk will focus on recently discovered abilities and vulnerabilities of image denoisers. In a nut-shell, we expose the possibility of using image denoisers for serving other problems, such as regularizing general inverse problems and serving as the engine for image synthesis. We also unveil the (strange?) idea that denoising (and other inverse problems) might not have a unique solution, as common algorithms would have you believe. Instead, we will describe constructive ways to produce randomized and diverse high perceptual quality results for inverse problems.
This was given as a plenary talk in the third international workshop on matrix computations, commemorating the 90th birthday of Gene Golub.
2021
Image denoising – removal of white additive Gaussian noise from an image – is one of the oldest and most studied problems in image processing. An extensive work over several decades has led to thousands of papers on this subject, and to many well-performing algorithms for this task. As expected, the era of deep learning has brought yet another revolution to this subfield, and took the lead in today’s ability for noise suppression in images. All this progress has led some researchers to believe that “denoising is dead”, in the sense that all that can be achieved is already done.
Exciting as all this story might be, this talk IS NOT ABOUT IT!
Our story focuses on recently discovered abilities and vulnerabilities of image denoisers. In a nut-shell, we expose the possibility of using image denoisers for serving other problems, such as regularizing general inverse problems and serving as the engine for image synthesis. We also unveil the (strange?) idea that denoising (and other inverse problems) might not have a unique solution, as common algorithms would have you believe. Instead, we will describe constructive ways to produce randomized and diverse high perceptual quality results for inverse problems.
A recording of this talk can be found HERE.
This talk was also given in the TCE-MLIS event on February 24th. Here is a recording of this talk (in Hebrew!)
Image denoising – removal of white additive Gaussian noise from an image – is one of the oldest and most studied problems in image processing. An extensive work over several decades has led to thousands of papers on this subject, and to many well-performing algorithms for this task. As expected, the era of deep learning has brought yet another revolution to this subfield, and took the lead in today’s ability for noise suppression in images. All this progress has led some researchers to believe that “denoising is dead”, in the sense that all that can be achieved is already done.
Exciting as all this story might be, this talk IS NOT ABOUT it!
Our story focuses on recently discovered abilities and vulnerabilities of image denoisers. In a nut-shell, we expose the possibility of using image denoisers for serving other problems, such as regularizing general inverse problems and serving as the engine for image synthesis. We also unveil the (strange?) idea that denoising might not have a unique solution, as common algorithms would have you believe. Instead, we’ll describe constructive ways to produce randomized and diverse high perceptual quality denoising results.
למידה עמוקה (deep learning) הוא תחום שישנה את חיינו – תיקון – הוא כבר משנה את חיינו. בהרצאה זו שניתנה לגורמי צה”ל, אני מספר את סיפורו המרתק של התחום הזה ואת התהפוכות שהוא עבר בששים השנים האחרונות. הרצאה זו נועדה לקהל הרחב ולא דורשת כל ידע מוקדם. זוהי גירסה ארוכה יותר של הרצאה דומה שניתנה ב- 2019 במסיבת היובל לפקולטה למדעי המחשב.
2020
How do we choose a network architecture in deep-learning solutions? By copying existing networks or guessing new ones, and sometimes by applying various small modifications to them via trial and error. This non-elegant and brute-force strategy has proven itself useful for a wide variety of imaging tasks. However, it comes with a painful cost – our networks tend to be quite heavy and cumbersome. Could we do better? In this talk we would like to propose a different point of view towards this important question, by advocating the following two rules: (i) Rather than “guessing” architectures, we should rely on classic signal and image processing concepts and algorithms, and turn these to networks to be learned in a supervised manner. More specifically, (ii) Sparse representation modeling is key in many (if not all) of the successful architectures that we are using. I will demonstrate these claims by presenting three recent image denoising networks that are light-weight and yet quite effective, as they follow the above guidelines.
2019
Sparse approximation is a well-established theory, with a profound impact on the fields of signal and image processing. In this talk we start by presenting this model and its features, and then turn to describe two special cases of it – the convolutional sparse coding (CSC) and its multi-layered version (ML-CSC). Amazingly, as we will carefully show, ML-CSC provides a solid theoretical foundation to … deep-learning. Alongside this main message of bringing a theoretical backbone to deep-learning, another central message that will accompany us throughout the talk: Generative models for describing data sources enable a systematic way to design algorithms, while also providing a complete mechanism for a theoretical analysis of these algorithms’ performance. This talk is meant for newcomers to this field – no prior knowledge on sparse approximation is assumed.
למידה עמוקה (deep learning) הוא תחום שישנה את חיינו – תיקון – הוא כבר משנה את חיינו. בהרצאה קצרה זו שניתנה ביובל ה-50 לפקולטה למדעי המחשב, אני מספר את סיפורו המרתק של התחום הזה ואת התהפוכות שהוא עבר בששים השנים האחרונות. הרצאה זו נועדה לקהל הרחב ולא דורשת כל ידע מוקדם.
Sparse approximation is a well-established theory, with a profound impact on the fields of signal and image processing. In this talk we start by presenting this model and its features, and then turn to describe two special cases of it – the convolutional sparse coding (CSC) and its multi-layered version (ML-CSC). Amazingly, as we will carefully show, ML-CSC provides a solid theoretical foundation to … deep-learning. Alongside this main message of bringing a theoretical backbone to deep-learning, another central message that will accompany us throughout the talk: Generative models for describing data sources enable a systematic way to design algorithms, while also providing a complete mechanism for a theoretical analysis of these algorithms’ performance. This talk is meant for newcomers to this field – no prior knowledge on sparse approximation is assumed.
Sparse approximation is a well-established theory, with a profound impact on the fields of signal and image processing. In this talk we start by presenting this model and its features, and then turn to describe two special cases of it – the convolutional sparse coding (CSC) and its multi-layered version (ML-CSC). Amazingly, as we will carefully show, ML-CSC provides a solid theoretical foundation to … deep-learning. Alongside this main message of bringing a theoretical backbone to deep-learning, another central message that will accompany us throughout the talk: Generative models for describing data sources enable a systematic way to design algorithms, while also providing a complete mechanism for a theoretical analysis of these algorithms’ performance. This talk is meant for newcomers to this field – no prior knowledge on sparse approximation is assumed.
2018
Sparse approximation is a well-established theory, with a profound impact on the fields of signal and image processing. In this talk we start by presenting this model, and then turn to describe two special cases of it – the convolutional sparse coding (CSC) and its multi-layered version (ML-CSC). Amazingly, as we will carefully show, ML-CSC provides a solid theoretical foundation to … deep-learning. This talk is meant for newcomers to these fields – no prior knowledge on sparse approximation is assumed.
Sparse approximation is a well-established theory, with a profound impact on the fields of signal and image processing. In this talk, we describe a special case of this model— the multi-layered convolutional sparse coding (ML-CSC) construction. As we will carefully show, ML-CSC provides a solid theoretical foundation to the field of deep learning, explaining the used architectures, their performance limits, and prospects for future alternatives.
Sparse approximation is a well-established theory, with a profound impact on the fields of signal and image processing. In this talk we start by presenting this model and its features, and then turn to describe two special cases of it – the convolutional sparse coding (CSC) and its multi-layered version (ML-CSC). Amazingly, as we will carefully show, ML-CSC provides a solid theoretical foundation to … deep-learning. Alongside this main message of bringing a theoretical backbone to deep-learning, another central message that will accompany us throughout the talk: Generative models for describing data sources enable a systematic way to design algorithms, while also providing a complete mechanism for a theoretical analysis of these algorithms’ performance. This talk is meant for newcomers to this field – no prior knowledge on sparse approximation is assumed.
פירוק אטומי, הטבלה המחזורית, הרכבה של מולקולה … כל זה נשמע כמו תחילתה של הרצאה בכימיה. אבל לא! נושאים אלו יעלו בהרצאה שתדון בעיבוד תמונות. עיבוד תמונות הינו תחום מרכזי בחיינו – מהטלוויזיה בבתינו, המצלמה הדיגיטאלית שבכיסנו (ולאחרונה כחלק מהטלפון הסלולרי), דרך צפייה בסרטי די-וי-די, בקרת איכות בפסי ייצור, מערכות עקיבה ואבטחה, ועד צילומי אולטראסאונד, טומוגרפיה ותהודה מגנטית בבתי-חולים. בכל אלה ובעוד מוצרים רבים, עיבוד תמונות מהווה טכנולוגיה שאי-אפשר בלעדיה. תחום זה תוסס ופורח הן בתעשיה והן באקדמיה, עם עשרות אלפי מהנדסים ומדענים בכל רחבי העולם העוסקים בו יום יום ושעה שעה. אז מה זה עיבוד תמונות? זהו הנושא שבו נדון בהרצאה זו. עיבוד תמונות מתייחס לטיפול בתמונות ע”י מחשב. אנו נעסוק בשאלות כגון כיצד תמונה עושה את דרכה אל המחשב, כיצד היא אגורה שם, מה ניתן לעשות בה משכבר היא שם, ועוד. אחת המטרות המרכזיות בהרצאה זו היא הצגת חזית הידע בתחום, ובפרט העשייה המחקרית בטכניון בזירה זו. הדיון הכללי הנ”ל על עיבוד תמונות ישמש אותנו כמצע עליו נבנה כדי להציג את עבודתנו מהעת האחרונה, בה אנו עוסקים במודלים לתמונות המשתמשים ברעיונות “כימיקליים”, בעזרתם אנו מטפלים במידע בכלל ובתמונות בפרט. אנו נתאר כיצד ניתן לפרק תמונה לאטומים, לבנות טבלה מחזורית של יסודות לתיאור תמונות, וכיצד אנו רותמים את כל אלה כדי לפתור בעיות מעשיות בתחום, כגון שיפור תמונות וסרטים ותיקונם מקלקולים שונים, השלמת חלקים חסרים בתמונות, דחיסה, ועוד
Sparse approximation is a well-established theory, with a profound impact on the fields of signal and image processing. In this talk we start by presenting this model, and then turn to describe two special cases of it — the convolutional sparse coding (CSC) and its multi-layered version (ML-CSC). Amazingly, as we will carefully show, ML-CSC provides a solid theoretical foundation to … deep-learning. This talk is meant for newcomers to these fields – no prior knowledge on sparse approximation is assumed.
Sparse approximation is a well-established theory, with a profound impact on the fields of signal and image processing. In this talk we describe two special cases of this model — the convolutional sparse coding (CSC) and its multi-layered version (ML-CSC). We show that the projection of signals (a.k.a. pursuit) to the ML-CSC model leads to various deep convolutional neural network architectures. This connection brings a fresh view to CNN, as we are able to accompany the above by theoretical claims such as uniqueness of the representations throughout the network, and their stable estimation, all guaranteed under simple local sparsity conditions. The ‘take-home-message’ from this talk is this: The ML-CSC model can serve as the theoretical foundation to deep-learning.
2017
Sparse approximation is a well-established theory, with a profound impact on the fields of signal and image processing. In this talk we start by presenting this model and its features, and then turn to describe two special cases of it: the convolutional sparse coding (CSC) and its multi-layered version (ML-CSC). Amazingly, as we will carefully show, ML-CSC provides a solid theoretical foundation to the field of deep-learning. Alongside this main message of bringing a theoretical backbone to deep-learning, another central message that will accompany us throughout the talk: Generative models for describing data sources enable a systematic way to design algorithms, while also providing a complete mechanism for a theoretical analysis of these algorithms’ performance. This talk is meant for newcomers to this field – no prior knowledge on sparse approximation is assumed.
Within the wide field of sparse approximation, convolutional sparse coding (CSC) has gained increasing attention in recent years. This model assumes a structured-dictionary built as a union of banded Circulant matrices. Most of the attention has been devoted to the practical side of CSC, proposing efficient algorithms for the pursuit problem, and identifying applications that benefit from this model. Interestingly, a systematic theoretical understanding of CSC seems to have been left aside, with the assumption that the existing classical results are sufficient. In this talk we start by presenting a novel analysis of the CSC model and its as- sociated pursuit. Our study is based on the observation that while being global, this model can be characterized and analyzed locally.
We show that uniqueness of the representation, its stability with respect to noise, and successful greedy or convex recovery are all guaranteed assuming that the underlying representation is locally sparse. These new results are much stronger and informative, compared to those obtained by deploying the classical sparse theory. Armed with these new insights, we proceed by proposing a multi-layer extension of this model, ML-CSC, in which signals are assumed to emerge from a cascade of CSC layers. This, in turn, is shown to be tightly connected to Convolutional Neural Networks (CNN), so much so that the forward-pass of the CNN is in fact the Thresholding pursuit serving the ML-CSC model. This connection brings a fresh view to CNN, as we are able to attribute to this architecture theoretical claims such as uniqueness of the representations throughout the network, and their stable estimation, all guaranteed under simple local sparsity conditions. Lastly, identifying the weaknesses in the above scheme, we propose an alternative to the forward-pass algorithm, which is both tightly connected to deconvolutional and recurrent neural networks, and has better theoretical guarantees.
Within the wide field of sparse approximation, convolutional sparse coding (CSC) has gained increasing attention in recent years. This model assumes a structured-dictionary built as a union of banded Circulant matrices. Most of the attention has been devoted to the practical side of CSC, proposing efficient algorithms for the pursuit problem, and identifying applications that benefit from this model. Interestingly, a systematic theoretical understanding of CSC seems to have been left aside, with the assumption that the existing classical results are sufficient. In this talk we start by presenting a novel analysis of the CSC model and its as- sociated pursuit. Our study is based on the observation that while being global, this model can be characterized and analyzed locally. We show that uniqueness of the representation, its stability with respect to noise, and successful greedy or convex recovery are all guaranteed assuming that the underlying representation is locally sparse.
These new results are much stronger and informative, compared to those obtained by deploying the classical sparse theory. Armed with these new insights, we proceed by proposing a multi-layer extension of this model, ML-CSC, in which signals are assumed to emerge from a cascade of CSC layers. This, in turn, is shown to be tightly connected to Convolutional Neural Networks (CNN), so much so that the forward-pass of the CNN is in fact the Thresholding pursuit serving the ML-CSC model. This connection brings a fresh view to CNN, as we are able to attribute to this architecture theoretical claims such as uniqueness of the representations throughout the network, and their stable estimation, all guaranteed under simple local sparsity conditions. Lastly, identifying the weaknesses in the above scheme, we propose an alternative to the forward-pass algorithm, which is both tightly connected to deconvolutional and recurrent neural networks, and has better theoretical guarantees.
Image denoising is the most fundamental problem in image enhancement, and it is largely solved: It has reached impressive heights in performance and quality — almost as good as it can ever get. But interestingly, it turns out that we can solve many other problems using the image denoising “engine”. I will describe the Regularization by Denoising (RED) framework: using the denoising engine in defining the regularization of any inverse problem. The idea is to define an explicit image-adaptive regularization functional directly using a high performance denoiser. Surprisingly, the resulting regularizer is guaranteed to be convex, and the overall objective functional is explicit, clear and well-defined. With complete flexibility to choose the iterative optimization procedure for minimizing this functional, RED is capable of incorporating any image denoising algorithm as a regularizer, treat general inverse problems very effectively, and is guaranteed to converge to the globally optimal result.
Style-transfer is a process of migrating a style from a given image to the content of another, synthesizing a new image which is an artistic mixture of the two. Recent work on this problem adopting Convolutional Neural-networks (CNN) ignited a renewed interest in this field, due to the very impressive results obtained. There exists an alternative path towards handling the style-transfer task, via generalization of texture-synthesis algorithms. I will present a novel such style-transfer algorithm that extends the texture-synthesis work of Kwatra et. al. (2005), while aiming to get stylized images that get closer in quality to the CNN ones.
2016
Image Processing is a fascinating scientific field, offering ways to handle visual data by computers. How can an image be brought to be stored and processed by a computer? What kind of such processing could be done which are worthwhile? In the first part of this talk we shall describe the core ideas behind the field of image processing by answering these two questions. In the second part of the talk we shall turn to describe the recent research activity in Elad’s group in the Computer-Science department at the Technion, emphasizing the vast work done on harnessing sparse and redundant representation modeling to image processing needs.
2015
Recent work in image processing repeatedly shows highly efficient reconstruction algorithms that lean on modeling of small overlapping patches. Such methods impose a local model in order to regularize a global inverse problem. Why does this work so well? Does this leave room for improvements? What does a local model imply globally on the unknown signal? In this talk we will start from algorithmic attempts that aim to understand this dichotomy in order to narrow the global-local gap. Gradually, we will turn the discussion to a theoretical point of view that provides a deeper understanding of such local models, and their global implications.
Compression of frontal facial images is an appealing and important application. Recent work has shown that specially tailored algorithms for this task can lead to performance far exceeding JPEG2000. This paper proposes a novel such compression algorithm, exploiting our recently developed redundant tree-based wavelet transform. Originally meant for functions defined on graphs and cloud of points, this new transform has been shown to be highly effective as an image adaptive redundant and multi-scale decomposition. The key concept behind this method is reordering of the image pixels so as to form a highly smooth 1D signal that can be sparsified by a regular wavelet. In this work we bring this image adaptive transform to the realm of compression of aligned frontal facial images. Given a training set of such images, the transform is designed to best sparsify the whole set using a common feature-ordering. Our compression scheme consists of sparse coding using the transform, followed by entropy coding of the obtained coefficients. The inverse transform and a post-processing stage are used to decode the compressed image. We demonstrate the performance of the proposed scheme and compare it to other competing algorithms.
In this survey talk I will walk you through a decade of fascinating research activity on “sparse and redundant representations”. We will start with a classic image processing task of noise removal and use it as a platform for the introduction of data models in general, and sparsity and redundancy as specific forces in such models. The emerging model will be shown to lead to a series of key theoretical and numerical questions, which we will handle next. A key problem with the use of sparse and redundant representation modeling is the need for a sparsifying dictionary – we will discuss ways to obtain such a dictionary by learning from examples, and introduce the K-SVD algorithm. Then we will show how all these merge into a coherent theory that can be deployed successfully to various image processing applications.
In this talk we present a generic recursive algorithm for improving image denoising methods. Given the initial denoised image, we suggest repeating the following procedure: (i) Strengthen the signal by adding the previous denoised image to the degraded input image, (ii) Operate the denoising method on the strengthened image, and (iii) Subtract the previous denoised image from the restored signals strengthened outcome. The convergence of this process is studied for the K-SVD image denoising and related algorithms. Furthermore, still in the context of K-SVD image denoising, we introduce an interesting interpretation of the SOS algorithm as a technique for closing the gap between the local patch-modeling and the global restoration task, thereby leading to improved performance. We demonstrate the SOS boosting algorithm for several leading denoising methods (KSVD, NLM, BM3D, and EPLL), showing tendency to further improve denoising performance.
2014
What if we take all the overlapping patches from a given image and organize them to create the shortest path by using their mutual Euclidean distances? This suggests a reordering of the image pixels in a way that creates a maximal 1D regularity. What could we do with such a construction? In this talk we consider a wider perspective of the above, and introduce a wavelet transform for graph-structured data. The proposed transform is based on a 1D wavelet decomposition coupled with a pre-reordering of the input so as to best sparsify the given data. We adopt this transform to image processing tasks by considering the image as a graph, where every patch is a node, and edges are obtained by Euclidean distances between corresponding patches. We show several ways to use the above ideas in practice, leading to state-of-the-art image denoising, deblurring, inpainting, and face-image compression results.
Images are 2D signals, and should be processed as such — this is the common belief in the image processing community. Is it truly the case? Around thirty years ago, some researchers suggested to convert images into 1D signals, so as to harness well-developed 1D tools such as adaptive-filtering and Kalman- estimation techniques. These attempts resulted with poorly performing algorithms, strengthening the above belief. Why should we force unnatural causality between spatially ordered pixels? Indeed, why? In this talk I will present a conversion of images into 1D signals that leads to state-of-the-art results in series of applications – denoising, inpainting, compression, and more. The core idea in our work is that there exists a permutation of the image pixels that carries in it most of the “spatial content”, and this ordering is within reach, even if the image is corrupted. We expose this permutation and use it in order to process the image as if it is a one-dimensional signal, treating successfully a series of image processing problems.
Images, video, audio, text documents, financial data, medical information, traffic info – all these and many others are data sources that can be effectively processed. Why? Is it obvious? In this talk we will start by discussing “modeling” of data as a way to enable their actual processing, putting emphasis on sparsity-based models. We will turn our attention to graph-structured data and propose a tailored sparsifying transform for its dimensionality reduction and subsequent processing. We shall conclude by showing how this new transform becomes relevant and powerful in revisiting … classical image processing tasks..
2013
What if we take all the overlapping patches from a given image and organize them to create the shortest path by using their mutual distances? This suggests a reordering of the image pixels in a way that creates a maximal 1D regularity. What could we do with such a construction? In this talk we consider a wider perspective of the above, and introduce a wavelet transform for graph-structured data. The proposed transform is based on a 1D wavelet decomposition coupled with a pre-reordering of the input so as to best sparsify the given data. We adopt this transform to image processing tasks by considering the image as a graph, where every patch is a node, and edges are obtained by Euclidean distances between corresponding patches. We show several ways to use the above ideas in practice, leading to state-of-the-art image denoising, deblurring, and inpainting results.
Images, video, audio, text documents, financial data, medical information, traffic info — all these and many others are data sources that can be effectively processed. Why? Is it obvious? In this talk we will start by discussing “modeling” of data as a way to enable their actual processing, putting emphasis on sparsity-based models. We will turn our attention to graph-structured data and propose a tailored sparsifying transform for its dimensionality reduction and subsequent processing. We shall conclude by showing how this new transform becomes relevant and powerful in revisiting … classical image processing tasks.
In this talk we describe the co-sparse analysis model, with emphasis on pursuit algorithms and dictionary learning for it. We present two of our recent activities on this subject: (i) A theoretical study of the Analysis-Thresholding algorithm, exposing measures of goodness for the dictionary that govern the pursuit performance; and (ii) The development of an analysis K-SVD algorithm that trains a dictionary from signal examples and its use for image denoising.
2012
In this talk we describe the co-sparse analysis model, with emphasis on pursuit algorithms and dictionary learning for it. We present two of our recent activities on this subject: (i) A theoretical study of the Analysis-Thresholding algorithm, exposing measures of goodness for the dictionary that govern the pursuit performance; and (ii) The development of an analysis K-SVD algorithm that trains a dictionary from signal examples and its use for image denoising.
What if we take all the overlapping patches from a given image and organize them to create the shortest path by using their mutual distances? This suggests a reordering of the image pixels in a way that creates a maximal 1D regularity. Could we repeat this process in several scales? What could we do with such a construction? In this talk we consider a wider perspective of the above line of questions: We introduce a wavelet transform that is meant for data organized as a connected-graph or as a cloud of high-dimensional points. The proposed transform constructs a tree that applies a 1D wavelet decomposition filters, coupled with a pre-reordering of the input, so as to best sparsify the given data. We adopt this transform to image processing tasks by considering the image as a graph, where every patch is a node, and vertices are obtained by Euclidean distances between corresponding patches. We show three ways to use the above ideas in practice – adopt only the patch-reordering, use the obtained wavelet transform as a sparsifying process, and a third approach were this transform is used as a regularizer. State-of-the-art image denoising, deblurring, and inpainting results are obtained with the proposed schemes.
What if we take all the overlapping patches from a given image and organize them to create the shortest path by using their mutual distances? This suggests a reordering of the image pixels in a way that creates a maximal 1D regularity Could we repeat this process in several “scales” ? What could we do with such a construction? In this talk we consider a wider perspective of the above line of questions: We introduce a wavelet transform that is meant for data organized as a connected-graph or as a cloud of high dimensional points. The proposed transform constructs a tree that applies a 1D wavelet decomposition filters, coupled with a pre-reordering of the input, so as to best sparsify the given data. We adopt this transform to image processing tasks by considering the image as a graph, where every patch is a node, and vertices are obtained by Euclidean distances between corresponding patches. State of- the-art image denoising results are obtained with the proposed scheme.
In this survey talk I will walk you through a decade of fascinating research activity on “sparse and redundant representations”. We will start with a classic image processing task of noise removal and use it as a platform for the introduction of data models in general, and sparsity and redundancy as specific forces in such models. The emerging model will be shown to lead to a series of key theoretical and numerical questions, which we will handle next. A key problem with the use of sparse and redundant representation modeling is the need for a sparsifying dictionary — we will discuss ways to obtain such a dictionary by learning from examples, and introduce the K-SVD algorithm. Then we will show how all these merge into a coherent theory that can be deployed successfully to various image processing applications.
The synthesis-based sparse representation model for signals has drawn a considerable interest in the past decade. Such a model assumes that the signal of interest can be decomposed as a linear combination of a few atoms from a given dictionary. In this talk we concentrate on an alternative, analysis-based model, where an analysis operator — hereafter referred to as the “Analysis Dictionary” – multiplies the signal, leading to a sparse outcome. While the two alternative models seem to be very close and similar, they are in fact very different. In this talk we define clearly the analysis model and describe how to generate signals from it. We discuss the pursuit denoising problem that seeks the zeros of the signal with respect to the analysis dictionary given noisy measurements. Finally, we explore ideas for learning the analysis dictionary from a set of signal examples. We demonstrate this model’s effectiveness in several experiments, treating synthetic data and real images, showing a successful and meaningful recovery of the analysis dictionary.
In this survey talk I will walk you through a decade of fascinating research activity on “sparse and redundant representations”. We will start with a classic image processing task of noise removal and use it as a platform for the introduction of data models in general, and sparsity and redundancy as specific forces in such models. The emerging model will be shown to lead to a series of key theoretical and numerical questions, which we will handle next. A key problem with the use of sparse and redundant representation modeling is the need for a sparsifying dictionary — we will discuss ways to obtain such a dictionary by learning from examples, and introduce the K-SVD algorithm. Then we will show how all these merge into a coherent theory that can be deployed successfully to various image processing applications.
The synthesis-based sparse representation model for signals has drawn a considerable interest in the past decade. Such a model assumes that the signal of interest can be decomposed as a linear combination of a few atoms from a given dictionary. In this talk we concentrate on an alternative, analysis-based model, where an analysis operator — hereafter referred to as the “Analysis Dictionary” – multiplies the signal, leading to a sparse outcome. While the two alternative models seem to be very close and similar, they are in fact very different. In this talk we define clearly the analysis model and describe how to generate signals from it. We discuss the pursuit denoising problem that seeks the zeros of the signal with respect to the analysis dictionary given noisy measurements. Finally, we explore ideas for learning the analysis dictionary from a set of signal examples. We demonstrate this model’s effectiveness in several experiments, treating synthetic data and real images, showing a successful and meaningful recovery of the analysis dictionary.
2011
Over the past decade there has been a great interest in a synthesis-based model for signals, based on sparse and redundant representations. Such a model assumes that the signal of interest can be decomposed as a linear combination of few columns from a given matrix (the dictionary). An alternative, analysis-based, model can be envisioned, where an analysis operator multiplies the signal, leading to a sparse outcome. In this work we propose a simple but effective analysis operator learning algorithm, where analysis “atoms” are learned sequentially by identifying directions that are orthogonal to a subset of the training data. We demonstrate the effectiveness of the algorithm in several experiments, treating synthetic data and real images, showing a successful and meaningful recovery of the analysis operator.
Multi-channel TV broadcast, Internet video and You-Tube, home DVD movies, video conference calls, cellular video calls and more – there is no doubt that videos are abundant and in everyday use. In many cases, the quality of the available video is poor, something commonly referred to as “low-resolution”. As an example, High-definition (HD) TV’s are commonly sold these days to customers that hope to enjoy a better viewing experience. Nevertheless, most TV broadcast today is still done in standard-definition (SD), leading to poor image quality on these screens. The field of Super-Resolution deals with ways to improve video content to increase optical resolution. The core idea: fusion of the visual content in several images can be performed and this can lead to a better resolution outcome. For years it has been assumed that such fusion requires knowing the exact motion the objects undergo within the scene. Since this motion may be quite complex in general, this stood as a major obstacle for industrial applications. Three years ago a break-through has been made in this field, allowing to bypass the need for exact motion estimation. In this lecture we shall survey the work in this field from its early days (25 years ago) and till very recently, and show the evolution of ideas and results obtained. No prior knowledge in image processing is required.
The synthesis-based sparse representation model for signals has drawn a considerable interest in the past decade. Such a model assumes that the signal of interest can be decomposed as a linear combination of a few atoms from a given dictionary. In this talk we concentrate on an alternative, analysis-based model, where an analysis operator — hereafter referred to as the “Analysis Dictionary” – multiplies the signal, leading to a sparse outcome. Our goal is to learn the analysis dictionary from a set of signal examples, and the approach taken is parallel and similar to the one adopted by the K-SVD algorithm that serves the corresponding problem in the synthesis model. We present the development of the algorithm steps, which include a tailored pursuit algorithm termed “Backward Greedy” algorithm and a penalty function for the dictionary update stage. We demonstrate its effectiveness in several experiments, treating synthetic data and real images, showing a successful and meaningful recovery of the analysis dictionary.
In the commonly used sparse representation modeling, the atoms are assumed to be independent of each other when forming the signal. In this talk we shall introduce a statistical model called Boltzman Machine (BM) that enables such dependencies to be taken into account. Adopting a Bayesian point of view, we first treat the pursuit problem – given a signal, and assuming that the model parameters and the dictionary are known, find its sparse representation. We derive the exact MAP estimation, and show that just like in the independent case, this leads to an exponential search problem. We derive two algorithms for its evaluation: a greedy approximation approach for the general case, and an exact estimation that corresponds to a unitary dictionary and banded interaction matrix. We also consider the estimation of the model parameters, learning these parameters directly from training data. We show that given the signals’ representations, this problem can be posed as a convex optimization task by using the Maximum Pseudo-Likelihood (MPL).
2010
This course (4 lectures and one tutorial) brings the core ideas and achievements made in the field of sparse and redundant representation modeling, with emphasis on the impact of this field to image processing applications. The five lectures (given as PPTX and PDF) are organized as follows:
Lecture 1: The core sparse approximation problem and pursuit algorithms that aim to approximate its solution.
Lecture 2: The theory on the uniqueness of the sparsest solution of a linear system, the notion of stability for the noisy case, guarantees for the performance of pursuit algorithms using the mutual coherence and the RIP.
Lecture 3: Signal (and image) models and their importance, the Sparseland model and its use, analysis versus synthesis modeling, a Bayesian estimation point of view, dictionary learning with the MOD and the K-SVD, global and local image denoising, local image inpainting.
Lecture 4: Sparse representations in image processing – image deblurring, global image separation and image inpainting. using dictionary learning for image and video denoising and inpainting, image scale-up using a pair of learned dictionaries, facial image compression with the K-SVD.
This course (5 lectures) brings the core ideas and achievements made in the field of sparse and redundant representation modeling, with emphasis on the impact of this field to image processing applications. The five lectures (given as PPTX and PDF) are organized as follows:
Lecture 1: The core sparse approximation problem and pursuit algorithms that aim to approximate its solution.
Lecture 2: The theory on the uniqueness of the sparsest solution of a linear system, the notion of stability for the noisy case, guarantees for the performance of pursuit algorithms using the mutual coherence and the RIP.
Lecture 3: Signal (and image) models and their importance, the Sparseland model and its use, analysis versus synthesis modeling, a Bayesian estimation point of view.
Lecture 4: First steps in image processing with the Sparseland model – image deblurring, image denoising, image separation, and image inpainting. Global versus local processing of images. Dictionary learning with the MOD and the K-SVD.
Lecture 5: Advanced image processing: Using dictionary learning for image and video denoising and inpainting, image scale-up using a pair of learned dictionaries, Facial image compression with the K-SVD.
This survey talk focuses on the use of sparse and redundant representations and learned dictionaries for image denoising and other related problems. We discuss the the K-SVD algorithm for learning a dictionary that describes the image content efficiently. We then show how to harness this algorithm for image denoising, by working on small patches and forcing sparsity over the trained dictionary. The above is extended to color image denoising and inpainting, video denoising, and facial image compression, leading in all these cases to state of the art results. We conclude with more recent results on the use of several sparse representations for getting better denoising performance. An algorithm to generate such set of representations is developed, and our analysis shows that by this we approximate the minimum-mean-squared-error (MMSE) estimator, thus getting better results.
Among the many ways to model signals, a recent approach that draws considerable attention is sparse representation modeling. In this model, the signal is assumed to be generated as a random linear combination of a few atoms from a pre-specified dictionary. In this talk we describe our recent work that analyzed two Bayesian denoising algorithms — the Maximum-Aposteriori Probability (MAP) and the Minimum-Mean-Squared-Error (MMSE) estimators, under the assumption that the dictionary is unitary. It is well known that both these estimators lead to a scalar shrinkage on the transformed coefficients, albeit with a different response curve. In this talk we start by deriving closed-form expressions for these shrinkage curves and then analyze their performance. Upper bounds on the MAP and the MMSE estimation errors are derived. We tie these to the error obtained by a so-called oracle estimator, where the support is given, establishing a worst-case gain-factor between the MAP/MMSE estimation errors and the oracle’s performance.
Scaling up a single image while preserving is sharpness and visual-quality is a difficult and highly ill-posed inverse problem. A series of algorithms have been proposed over the years for its solution, with varying degrees of success. In CVPR 2008, Yang, Wright, Huang and Ma proposed a solution to this problem based on sparse representation modeling and dictionary learning. In this talk I present a variant of their method with several important differences. In particular, the proposed algorithm does not need a separate training phase, as the dictionaries are learned directly from the image to be scaled-up. Furthermore, the high-resolution dictionary is learned differently, by forcing its alignment with the low-resolution one. We show the benefit these modifications bring in terms of simplicity of the overall algorithm, and its output quality.
פירוק אטומי, הטבלה המחזורית, הרכבה של מולקולה … כל זה נשמע כמו תחילתה של הרצאה בכימיה. אבל לא! נושאים אלו יעלו בהרצאה שתדון בעיבוד תמונות. עיבוד תמונות הינו תחום מרכזי בחיינו – מהטלוויזיה בבתינו, המצלמה הדיגיטאלית שבכיסנו (ולאחרונה כחלק מהטלפון הסלולרי), דרך צפייה בסרטי די-וי-די, בקרת איכות בפסי ייצור, מערכות עקיבה ואבטחה, ועד צילומי אולטראסאונד, טומוגרפיה ותהודה מגנטית בבתי-חולים. בכל אלה ובעוד מוצרים רבים, עיבוד תמונות מהווה טכנולוגיה שאי-אפשר בלעדיה. תחום זה תוסס ופורח הן בתעשיה והן באקדמיה, עם עשרות אלפי מהנדסים ומדענים בכל רחבי העולם העוסקים בו יום יום ושעה שעה. אז מה זה עיבוד תמונות? זהו הנושא שבו נדון בהרצאה זו. עיבוד תמונות מתייחס לטיפול בתמונות ע”י מחשב. אנו נעסוק בשאלות כגון כיצד תמונה עושה את דרכה אל המחשב, כיצד היא אגורה שם, מה ניתן לעשות בה משכבר היא שם, ועוד. אחת המטרות המרכזיות בהרצאה זו היא הצגת חזית הידע בתחום, ובפרט העשייה המחקרית בטכניון בזירה זו. הדיון הכללי הנ”ל על עיבוד תמונות ישמש אותנו כמצע עליו נבנה כדי להציג את עבודתנו מהעת האחרונה, בה אנו עוסקים במודלים לתמונות המשתמשים ברעיונות “כימיקליים”, בעזרתם אנו מטפלים במידע בכלל ובתמונות בפרט. אנו נתאר כיצד ניתן לפרק תמונה לאטומים, לבנות טבלה מחזורית של יסודות לתיאור תמונות, וכיצד אנו רותמים את כל אלה כדי לפתור בעיות מעשיות בתחום, כגון שיפור תמונות וסרטים ותיקונם מקלקולים שונים, השלמת חלקים חסרים בתמונות, דחיסה, ועוד
2009
Cleaning of noise from signals is a classical and long-studied problem in signal processing. Algorithms for this task necessarily rely on an a-priori knowledge about the signal characteristics, along with information about the noise properties. For signals that admit sparse representations over a known dictionary, a commonly used denoising technique is to seek the sparsest representation that synthesizes a signal close enough to the corrupted one. As this problem is too complex in general, approximation methods, such as greedy pursuit algorithms, are often employed. In this line of reasoning, we are led to believe that detection of the sparsest representation is key in the success of the denoising goal. Does this means that other competitive and slightly inferior sparse representations are meaningless? Suppose we are served with a group of competing sparse representations, each claiming to explain the signal differently. Can those be fused somehow to lead to a better result? Surprisingly, the answer to this question is positive; merging these representations can form a more accurate, yet dense, estimate of the original signal even when the latter is known to be sparse. In this talk we demonstrate this behavior, propose a practical way to generate such a collection of representations by randomizing the Orthogonal Matching Pursuit (OMP) algorithm, and produce a clear analytical justification for the superiority of the associated Randomized OMP (RandOMP) algorithm. We show that while the Maximum a-posterior Probability (MAP) estimator aims to find and use the sparsest representation, the Minimum Mean-Squared-Error (MMSE) estimator leads to a fusion of representations to form its result. Thus, working with an appropriate mixture of candidate representations, we are surpassing the MAP and tending towards the MMSE estimate, and thereby getting a far more accurate estimation, especially at medium and low SNR. Another topic covered in thistalk concerns the case of a unitary dictionary. In such a case it is well-known that the MAP estimators has a closed-form and exact solution, and OMP is accurately computing it. Can a similar result be derived for MMSE? We show that this is indeed possible, obtaining a recursive formula that computes the MMSE simply and exactly.
2008
In this talk we describe applications such as image denoising and beyond using sparse and redundant representations. Our focus is on ways to perform these tasks with trained dictionaries using the K-SVD algorithm. As trained dictionaries are limited in handling small image patches, we deploy these within a Bayesian reconstruction procedure by forming an image prior that forces every patch in the resulting image to have a sparse representation.
Super-resolution reconstruction proposes a fusion of several low quality images into one higher quality result with better optical resolution. Classic super resolution techniques strongly rely on the availability of accurate motion estimation for this fusion task. When the motion is estimated inaccurately, as often happens for non-global motion fields, annoying artifacts appear in the super-resolved outcome. Encouraged by recent developments on the video denoising problem, where state-of-the-art algorithms are formed with no explicit motion estimation, we seek a super-resolution algorithm of similar nature that will allow processing sequences with general motion patterns. In this talk we base our solution on the Non-Local-Means (NLM) algorithm. We show how this denoising method is generalized to become a relatively simple super-resolution algorithm with no explicit motion estimation. Results on several test movies show that the proposed method is very successful in providing super-resolution on general sequences.
Cleaning of noise from signals is a classical and long-studied problem in signal processing. Algorithms for this task necessarily rely on an a-priori knowledge about the signal characteristics, along with information about the noise properties. For signals that admit sparse representations over a known dictionary, a commonly used denoising technique is to seek the sparsest representation that synthesizes a signal close enough to the corrupted one. As this problem is too complex in general, approximation methods, such as greedy pursuit algorithms, are often employed. In this line of reasoning, we are led to believe that detection of the sparsest representation is key in the success of the denoising goal. Does this means that other competitive and slightly inferior sparse representations are meaningless? Suppose we are served with a group of competing sparse representations, each claiming to explain the signal differently. Can those be fused somehow to lead to a better result? Surprisingly, the answer to this question is positive; merging these representations can form a more accurate, yet dense, estimate of the original signal even when the latter is known to be sparse. In this talk we demonstrate this behavior, propose a practical way to generate such a collection of representations by randomizing the Orthogonal Matching Pursuit (OMP) algorithm, and produce a clear analytical justification for the superiority of the associated Randomized OMP (RandOMP) algorithm. We show that while the Maximum a-posterior Probability (MAP) estimator aims to nd and use the sparsest representation, the Minimum Mean-Squared-Error (MMSE) estimator leads to a fusion of representations to form its result. Thus, working with an appropriate mixture of candidate representations, we are surpassing the MAP and tending towards the MMSE estimate, and thereby getting a far more accurate estimation, especially at medium and low SNR.
In this survey talk we focus on the use of sparse and redundant representations and learned dictionaries for image denoising and other related problems. We discuss the the K-SVD algorithm for learning a dictionary that describes the image content effectively. We then show how to harness this algorithm for image denoising, by working on small patches and forcing sparsity over the trained dictionary. The above is extended to color image denoising and inpainitng, video denoising, and facial image compression, leading in all these cases to state of the art results. We conclude with very recent results on the use of several sparse representations for getting better denoising performance. An algorithm to generate such set of representations is developed, and our analysis shows that by this method we approximate the minimum-mean-squared-error (MMSE) estimator, thus getting better results.
An under-determined linear system of equations Ax = b with non-negativity constraint is considered. It is shown that for matrices A with a row-span intersecting the positive orthant, if this problem admits a sufficiently sparse solution, it is necessarily unique. The bound on the required sparsity depends on a coherence property of the matrix A. It is further shown that A undergoes a conditioning stage with some degrees of freedom, which may be used to improve the coherence measure and strengthen the claimed result.
2007
Modeling of signals or images by a sparse and redundant representation is shown in recent years to be very effective, often leading to stat-of-the-art results in many applications. Applications leaning on this model can be cast as energy minimization problems, where the unknown is a high-dimensional and very sparse vector. Surprisingly, traditional tools in optimization, including very recently developed interior-point algorithms, tend to perform very poorly on these problems. A recently emerging alternative is a family of techniques, known as “iterated-shrinkage” methods. There are various and different such algorithms, but common to them all is the fact that each of their iterations require a simple forward and inverse transform (e.g. wavelet), and a scalar shrinkage look-up-table (LUT) step. In this talk we shall explain the need for such algorithms, present some of them, and show how they perform on a classic image deblurring problem.
Compressed-Sensing (CS) offers a joint compression- and sensing-processes, based on the existence of a sparse representation of the treated signal and a set of projected measurements. Work on CS thus far typically assumes that the projections are drawn at random. In this talk we describe ways to optimize these projections. Two methods are considered: (i) A direct optimization with respect to the CS performance, targeting CS applied with the Orthogonal Matching Pursuit (OMP) or Basis Pursuit (BP), and (ii) A simpler method that targets an average measure of the mutual-coherence of the effective dictionary. As the first approach leads to a complex bi-level optimization task that is hard to handle, we demonstrate the second one and show that it leads to better CS reconstruction performance.
In this very brief talk I describe the need to model images in general, and then briefly present the Sparse-Land model. The talk includes a demonstration of a sequence of applications in image processing where this model has been deployed successfully, including denoising of still, color and video images, inpainting, and compression. The moral to take home is: “The Sparse-Land model is a new and promising model that can adapt to many types of data sources. Its potential for medical imaging is an important opportunity that should be explored”.
2006
In this talk we consider several inverse problems in image processing, using sparse and redundant representations over trained dictionaries. Using the K-SVD algorithm, we obtain a dictionary that describes the image content effectively. Two training options are considered: using the corrupted image itself, or training on a corpus of high-quality image database. Since the K-SVD is limited in handling small image patches, we extend its deployment to arbitrary image sizes by defining a global image prior that forces sparsity over patches in every location in the image. We show how such Bayesian treatment leads to a simple and effective denoising algorithm for gray-level images with state-of-the-art denoising performance. We then extend these results to color images, handling their denoising, inpainting, and demosaicing. Following the above ideas, with necessary modifications to avoid color artifacts and over-fitting, we present stat-of-the art results in each of these applications. Another extension considered is video denoising — we demonstrate how the above method can be extended to work with 3D patches, propagate the dictionary from one frame to another, and get both improved denoising performance while also reducing substantially the computational load per pixel.
The super-resolution reconstruction problem addresses the fusion of several low quality images into one higher-resolution outcome. A typical scenario for such a process could be the fusion of several video fields into a higher resolution output that can lead to high quality printout. The super-resolution result provides TRUE resolution, as opposed to the typically used interpolation techniques. The core idea behind this ability is the fact that higher-frequencies exist in the measurements, although in an aliased form, and those can be recovered due to the motion between the frames. Ever since the pioneering work by Tsai and Huang (1984), who demonstrated the core ability to get super-resolution, much work has been devoted by various research groups to this problem and ways to solve it. In this talk I intend to present the core ideas behind the super-resolution (SR) problem, and our very recent results in this field. Starting form the problem modeling, and posing the super-resolution task as a general inverse problem interpretation, we shall see how the SR problem can be addressed effectively using ML and later MAP estimation methods. This talk also show various ingredients that are added to the reconstruction process to make it robust and efficient. Many results will accompany these descriptions, so as to show the strengths of the methods.
In signal and image processing, we often use transforms in order to simplify operations or to enable better treatment to the given data. A recent trend in these fields is the use of over complete linear transforms that lead to a sparse description of signals. This new breed of methods is more difficult to use, often requiring more computations. Still, they are much more effective in applications such as signal compression and inverse problems. In fact, much of the success attributed to the wavelet transform in recent years, is directly related to the above-mentioned trend. In this talk we will present a survey of this recent path of research, and its main results. We will discuss both the theoretic and the application sides to this field. No previous knowledge is assumed (… just common sense, and little bit of linear algebra).
We address the image denoising problem, where zero mean white and homogeneous Gaussian additive noise should be removed from a given image. The approach taken is based on sparse and redundant representations over a trained dictionary. The proposed algorithm denoises the image, while simultaneously training a dictionary on its (corrupted) content using the K-SVD algorithm. As the dictionary training algorithm is limited in handling small image patches, we extend its deployment to arbitrary image sizes by defining a global image prior that forces sparsity over patches in every location in the image. We show how such Bayesian treatment leads to a simple and effective denoising algorithm, with state-of-the-art performance, equivalent and sometimes surpassing recently published leading alternative denoising methods.
In this talk we present a novel method for separating images into texture and piece-wise smooth parts, and show how this formulation can also lead to image inpainting. Our separation and inpainting processes are based on sparse and redundant representations of the two contents – cartoon and texture – over different dictionaries. Using the Basis Pursuit Denoising (BPDN) to formulate the overall penalty function, we achieve a separation of the image, denoising, and inpainting. In fact, with a small modification, the damn thing can make coffee.
Recently, three independent works suggested iterated shrinkage algorithms that generalizes the classic work by Donoho and Johnston. Daubechies, Defrise and De-Mol developed such an algorithm for deblurring, using the concept of surrogate functions. Figueirido and Nowak used the EM algorithm to construct such algorithm, and later extended their work by using the bound-optimization technique. Elad developed such an algorithm based on a parallel coordinate descent (PCD) point of view. In this talk we describe these methods with an emphasis on the later, and demonstrate how it can be of importance for the minimization of a general basis pursuit penalty function. As such, the proposed algorithms form a new pursuit technique, falling in between the basis pursuit and the matching pursuit.
2005
Shrinkage is a well known and appealing denoising technique. The use of shrinkage is known to be optimal for Gaussian white noise, provided that the sparsity on the signal’s representation is enforced using a unitary transform. Still, shrinkage is also practiced successfully with non-unitary, and even redundant representations. In this lecture we shed some light on this behavior. We show that simple shrinkage could be interpreted as the first iteration of an algorithm that solves the basis pursuit denoising (BPDN) problem. Thus, this work leads to a sequential shrinkage algorithm that can be considered as a novel and effective pursuit method. We demonstrate this algorithm, both synthetically, and for the image denoising problem, showing in both cases its superiority over several popular alternatives..
In our field, when composing a super-resolved image, two ingredients contribute to the ability to get a leap in resolution: (i) the existence of many and diverse measurements, and (ii) the availability of a model to reliably describe the image to be produced. This second part, also known as the regularization or the prior, is of generic importance, and could be deployed to any inverse problem, and used by many other applications (compression, image synthesis, and more). The well-known recent work by Baker and Kanade (’02) and the work that followed (Lin and Shum ’04, Robinson and Milanfar ’05) all suggest that while the measurements are limited in gaining a resolution increase, the prior could be used to break this barrier. Clearly, the better the prior used, the higher the quality we can expect from the overall reconstruction procedure. Indeed, recent work on super-resolution (and other inverse problems) departs from the regular Tikhonov method, and tends to the robust counterparts, such as TV or the bilateral prior (see Farsiu et. al. ’04).
A recent trend with a growing popularity is the use of examples in defining the prior. Indeed, Baker and Kanade were the first to introduce this notion to the super-resolution task. There are several ways to use examples in shaping the prior to become better. The work by Mumford and Zhu (’99) and the follow-up contribution by Haber and Tenorio (02′) suggest a parametric approach. Baker and Kanade (’02), Freeman et. al. (several contributions ’01), Nakagaki and Katzaggelos (’03) all use the examples to directly learn the reconstruction function, by observing low-res. versus high-res. pairs.
In this talk we survey this line of work and show how it can be extended in several important ways. We show a general framework that builds an example-based prior that is independent of the inverse problem at hand, and we demonstrate it on several such problems, with promising results.
In recent years there is a growing interest in the study of sparse representation for signals. Using an over-complete dictionary that contains prototype signal-atoms, signals are described as sparse linear combinations of these atoms. Recent activity in this field concentrated mainly on the study of pursuit algorithms that decompose signals with respect to a given dictionary. Designing dictionaries to better fit the above model can be done by either selecting pre-specified transforms, or by adapting the dictionary to a set of training signals. Both these techniques have been considered in recent years, however this topic is largely still open. In this presentation we address the latter problem of designing dictionaries, and introduce the K-SVD algorithm for this task. We show how this algorithm could be interpreted as a generalization of the K-Means clustering process, and demonstrate its behavior in both synthetic tests and in applications on real data. The accompanying paper also describes its generalization to non-negative matrix factorization problem that suits signals generated under an additive model with positive atoms.
A recent trend in signal, image, and data analysis is the use of overcomplete linear transforms that lead to a sparse description of the processed data. This new breed of methods is more difficult to use, but they are much more effective in applications such as data compression and solution of inverse problems. In this talk we present a wide angle view to this recent path of research.
Retinex theory deals with the removal of unfavorable illumination effects from images. This ill-posed inverse problem is typically regularized by forcing spatial smoothness on the recoverable illumination. Recent work in this field suggested exploiting the knowledge that the illumination image bounds the image from above, and the fact that the reflectance is also expected to be smooth.
In this lecture we show how the above model can be improved to provide a non-iterative retinex algorithm that handles better edges in the illumination, and suppresses noise in dark areas. This algorithm uses two specially tailored bilateral filters — the first evaluates the illumination and the other is used for the computation of the reflectance. This result stands as a theoretic justification and refinement for the recently proposed heuristic use of the bilateral filter for retinex by Durand and Dorsey. In line with their appealing way of speeding up the bilateral filter, we show that similar speedup methods apply to our algorithm.
In signal and image processing, we often use transforms in order to simplify operations or to enable better treatment to the given data. A recent trend in these fields is the use of overcomplete linear transforms that lead to a sparse description of signals. This new breed of methods is more difficult to use, often requiring more computations. Still, they are much more effective in applications such as signal compression and inverse problems. In fact, much of the success attributed to the wavelet transform in recent years, is directly related to the above-mentioned trend. In this talk we will present a survey of this recent path of research, its main results, and the involved players and their contributions. We will discuss both the theoretic and the application sides to this field. No previous knowledge is assumed.
2004
Transforming signals is typically done in order to simplify their representations. Among the many ways to do so, the use of linear combinations taken over a redundant dictionary is appealing due to both its simplicity and its diversity. Choosing the sparsest of all solutions aligns well with our desire for a simple signal description, and this also leads to uniqueness. Since the search for the sparsest representation is NP-hard, methods such as the Basis-Pursuit (BP) and the Matching Pursuit (MP) have been proposed in the mid 90’s to approximate the desired sparse solution.
The pioneering work by Donoho and Huo (’99) started a sequence of research efforts, all aiming to theoretically understand the quality of approximations obtained by the pursuit algorithms, and the limits to their success. A careful study established that both BP and MP algorithms are expected to lead to the sparsest of all representations if indeed such solution is sparse enough. Later work generalized these results to the case where error is allowed in the representation. Very recent results addressed the same analysis from a probabilistic point of view, finding bounds on the average performance, and showing a close resemblance to empirical evidence.
All these results lead to the ability to use the pursuit algorithms with clear understanding of their expected behavior, in what Stanley Osher would have called “emotionally uninvolved” manner. This paves the way for future transforms that will be based on (i) overcomplete (redundant) representations, (ii) linear in constructing signals, and non-linear in their decomposition, and (iii) sparsity as their core force. Furthermore, as signal transforms, signal compression, and inverse problems, are all tangled together, we are now armed with new and effective tools when addressing many problems in signal and image processing.
In this talk we present a survey of this recent path of research, its main results, and the involved players and their contributions. We will discuss both the theoretic and the application sides to this field. No previous knowledge is assumed.
In this talk we present a novel method for separating images into texture and piecewise smooth parts, and show how this formulation can also lead to image inpainting. Our separation and inpainting process exploits both the variational and the sparsity mechanisms, by combining the Basis Pursuit Denoising (BPDN) algorithm and the Total-Variation (TV) regularization scheme.
The basic idea in this work is the use of two appropriate dictionaries, one for the representation of textures, and the other for the natural scene parts, assumed to be piece-wise-smooth. Both dictionaries are chosen such that they lead to sparse representations over one type of image-content (either texture or piecewise smooth). The use of the BPDN with the two augmented dictionaries leads to the desired separation, along with noise removal as a by-product. As the need to choose a proper dictionary for natural scene is very hard, a TV regularization is employed to better direct the separation process.
This concept of separation via sparse and over-complete representation of the image is shown to have a direct and natural extension to image inpainting. When some of the pixels in known locations in the image are missing, the same separation formulation can be changed to fit the problem of decomposing the image while filling in the holes. Thus, as a by-product of the separation we achieve inpainting. This approach should be compared to a recently published inpainting system by Bertalmio, Vese, Sapiro, and Osher. We will present several experimental results that validate the algorithm’s performance.
The separation of image content into semantic parts plays a vital role in applications such as compression, enhancement, restoration, and more. In recent years several pioneering works suggested such separation based on variational formulation, and others using independent component analysis and sparsity. In this talk we present a novel method for separating images into texture and piecewise smooth parts, exploiting both the variational and the sparsity mechanisms, by combining the Basis Pursuit Denoising (BPDN) algorithm and the Total-Variation (TV) regularization scheme.
The basic idea in our work is the use of two appropriate dictionaries, one for the representation of textures, and the other for the natural scene parts, assumed to be piece-wise-smooth. Both dictionaries are chosen such that they lead to sparse representations over one type of image-content (either texture or piecewise smooth). The use of the BPDN with the two augmented dictionaries leads to the desired separation, along with noise removal as a by-product. As the need to choose a proper dictionary for natural scene is very hard, a TV regularization is employed to better direct the separation process. We will present several experimental results that validate the algorithm’s performance.
2003
One of the most fundamental problems in the treatment of high-dimensional data is classification of a cloud of points in R^D into several sub-classes based on training data. An important such task is the pattern detection problem in images, which requires a separation between ‘Target’ and ‘Clutter’ classes, where every instance of a pattern in each of these classes appears as a sequence of D pixels. In most cases, the following properties hold true for the target detection task: (i) the probability of the ‘Target’ class is substantially smaller compared to that of the ‘Clutter’ ; (ii) the volume occupied by the target class in R^D is far smaller than that held by the clutter set ; and (iii) The target set is either convex or can be divided to several sub-sets each being convex.
In this talk we describe a new classifier that exploits these properties, yielding a low complexity yet effective target detection algorithm. This algorithm, called the Maximal Rejection Classifier (MRC), is based on successive rejection operations. Each such rejection stage is performed using a linear projection followed by thresholding. The projection direction is designed to maximize the number of rejected ‘Clutter’ points from further consideration. An application of detecting frontal and vertical faces in images is demonstrated using the MRC with encouraging results.
When over-complete dictionary is used, the search for the sparsest of all representations amounts to the need to solve a highly complicated combinatorial problem, with complexity growing exponenentially with the number of atoms in the dictionary. The Basis Pursuit algorithm (BP) (by Chen, Donoho, and Saundrers, 1995) was shown to be a highly effective tool for approximating the solution for the above problem.
In this talk we describe a fascinating analysis that explains the surprising empirical success of the PB. We start by presenting Donoho and Huo’s results, and then show how these results could be further improved. We first discuss the case of pair of ortho-bases and show tighter bounds on the required sparsity of the signal representation that guarantees BP-optimality. We then turn to extend previous results for arbitrary dictionaries, showing that all previous work falls as a special case of the new theory. Finally, we show how the obtained analysis results can be used to solve the problem of separating a volume of voxels into a sparse set of points, lines, and planes.
2002
Transforming a signal is typically done in order to simplify its representation. Among the many ways to represent signals, the use of a linear combination taken from an over-complete dictionary is appealing due to its ability to cover a diversity of signal behaviors in one transform. However, with this benefit comes the problem of non-unique representation. Choosing the sparsest of all representations aligns well with our desire for simple signal description, but searching such representation becomes a hard to solve non-convex and highly non-smooth optimization problem. The “Basis-Pursuit” (BP) algorithm (Chen, Donoho, and Saunders – 1995) is a stable approximate solver for the above task, replacing the non-convex L_0 minimization with a L_1 norm. A later work by Donoho and Huo (1998) theoretically proved that for specific dictionaries built from pair of ortho-bases and for sparse enough representations, the BP algorithm is indeed optimal.
In this talk we start by describing Donoho and Huo’s work on the BP algorithm, and then show how these results could be further improved. We first discuss the case of pair of ortho-bases and show tighter bounds on the required sparsity of the signal representation that guarantees BP-optimality. We then turn to extend previous results for arbitrary dictionaries, showing that all previous work falls as a special case of the new theory. Finally, we show work in progress that relates the Basis Pursuit algorithm used as a non-linear filtering to the Bayesian approach. We show how TV, Wavelet denoising, general Bayesian priors, and specifically the bilateral filter are all special cases of the Basis Pursuit for difference choices of dictionaries.
In a wide range of applied problems of 2D and 3D imaging – including Tomography, image rotation, image registration and more – a common continuum formulation of the problem places great emphasis on obtaining and manipulating the Fourier transform in polar coordinates. However, the translation of continuum ideas into practical work with modern digital data sampled on a Cartesian grid is problematic. It is widely believed that “there is no Polar Fast Fourier Transform” and that for practical work, continuum ideas are at best a source of inspiration rather than a practical tool.
In this talk we describe a fast and highly accurate Polar-FFT. A special feature of the proposed approach is that it involves only 1D equi-spaced FFT’s. In particular, there is no role for notions like “re-gridding” and “scattered-data-interpolation”, as might have been supposed from the existing literature in related applications fields such as Tomography. A central tool in our approach is the Pseudo-Polar FFT, an FFT where the evaluation frequencies lie in an oversampled set of non-angularly equispaced points. In the talk we describe the concept of Pseudo-Polar domain (proposed originally by Averbuch, Coifman, Donoho, Israeli and Walden at 1998), including fast forward and inverse transform, a quasi-Parseval relation, and provide empirical and theoretical analysis of the Gram operator of the Pseudo-Polar FFT. For those interested primarily in Polar FFT’s, Pseudo-Polar FFT’s play the role of a halfway point – a nearly polar system from which conversion to Polar Coordinates uses processes relying purely on 1D operations. We illustrate a number of applications of our Polar and Pseudo-Polar FFT’s.
This talk discusses the problem of recovering a planar polygon from its measured moments. The moments correspond to an indicator function defined over the polygon’s support. Previous work on this problem gave necessary and sufficient conditions for such successful recovery process and focused mainly on the case of exact measurements being given. In this talk we describe an extension of these results treating the same problem in the case where a longer than necessary series of noise corrupted moments is given.
Leaning on similar problems in array processing, system identification, and signal processing, we discuss a set of possible estimation procedures which are based on the Prony and the Pencil methods, relate them one to the other, and compare them through simulations. We then present an improvement over these methods based on the direct use of the Maximum-Likelihood estimator, exploiting the above methods as initialization. Finally, we show how regularization, and thus Maximum A-posteriori Probability estimator could be applied to reflect prior knowledge about the recovered polygon.
Additive noise removal from a given signal (also known as de-noising) is an important stage in many applications in signal processing. Various approaches have been proposed throughout the years. This talk focuses on Bayesian smoothing and edge-preserving methods. Classical algorithms in this family are typically based on Weighted Least Squares (WLS), Robust Estimation (RE), and Anisotropic Diffusion (AD). These methods share common features such as adaptivity to the data, formation as optimization problems, and the need for iterative-based restoration. In 1998 Tomasi and Manduchi (CS, Stanford) proposed an alternative heuristic non-iterative filter for noise removal called the bilateral filter. It was shown to give similar and possibly better results compared to the abovementioned iterative approaches.
However, the bilateral filter was proposed as an intuitive tool without theoretical connection to the classical approaches. In this talk the various noise-removal techniques discussed here (WLS, RE, AD, and the bilateral filter) are presented and related theoretically to each other. In particular, it is shown that RE (and AD) could be interpreted as WLS with weights replaced after each iteration. Also, it is shown that the bilateral filter emerges from the Bayesian approach as a single iteration of the Jacobi iterative algorithm for a properly posed smoothness penalty. Based on this observation, it is shown how this new filter can be improved and extended to treat more general reconstruction problems.
The super-resolution reconstruction process deals with the fusion of several low quality and low-resolution images into one higher-resolution and possibly better final image. We start by showing that from theoretic point of view, this fusion process is based on generalized sampling theorems due to Yen (1956) and Papulis (1977). When more realistic scenario is considered with blur, arbitrary motion, and additive noise, an estimation approach is considered instead.
We describe methods based on the Maximum-Likelihood (ML), Maximum-A-posteriori Probability (MAP), and the Projection onto Convex Sets POCS) as candidate tools to use. Underlying all these methods is the development of a model describing the relation between the measurements (low-quality images) and the desired output (high-resolution image). Through this path we presents the basic rational behind super-resolution, and then present the dichotomy between the static and the dynamic super-resolution process. We proposed treatment of both, and deal with several interesting special cases.
The target detection problem is defined as the need to separate targets from clutter instances. Among the many example-based techniques for the solution of this problem, the family of rejection-based classifiers are consistently exhibiting state-of-the-art accuracy while being the fastest. This rejection-based approach advocates the use of large set of weak-classifiers chained sequentially. After application of each such atom-block, a rejection of some of the clutter is performed while guaranteeing no loss of targets.
While intuitively appealing, theoretic background for this method was gathered only recently. Some roots of it can be traced to the boosting algorithm and the decision tree methods – two wide fields of research in machine learning that concentrate on using multiple weak-classifiers for the construction of a complicated overall machine. Rejection as a concept was proposed and analyzed by Nayar and Baker, with emphasis on the multi-class problems. More recently Elad, Hel-Or, and Keshet proposed the Maximal-rejection-Classifier (MRC), and employed it to the face detection problem. To conclude this list of works on the rejection-based idea, we should mention the work of Viola and Johns on the face detection problem using sub-linear weak-classifiers joined via boosting. In this talk I’ll survey the various contributions to the rejection idea and its efficient implementation on face detection problem.
Pattern detection problems require a separation between two classes, ‘Target’ and ‘Clutter’, where the probability of the former is substantially smaller than that of the latter. We describe a new classifier that exploits this property, yielding a low complexity yet effective target detection algorithm. This Maximal Rejection Classifier (MRC) algorithm is based on successive rejection operations. Each rejection stage is performed using a linear projection followed by thresholding. The projection direction is designed to maximize the number of ‘Clutter’ points rejected from further consideration. An application of detecting frontal and vertical faces in images is demonstrated using the MRC, with encouraging results.