Image Denoising with Autoencoder

Image Denoising with Autoencoder

Author

Amir Fawwaz

Published

September 17, 2023

1 Introduction

Artificial Neural Networks (ANNs) are a class of machine learning algorithms that learn from data and specialize in pattern recognition.1This deep neural networks now are used in many fields and show success in various artificial intelligence tasks such as computer vision, natural language processing and even computational finance.

Deep learning is nothing but many classifiers working together, which are based on linear regression followed by some activation functions. Its basis is the same as the traditional statistical linear regression \(W^{T}X+b\) approach. The only difference is that there are many neural nodes in deep learning instead of only one node which is called linear regression in the traditional statistical learning. These neural nodes are also known as a neural network, and one classifier node is known as a neural unit or perception. Another contrasting point need to be noticed is that in deep learning there are many layers between the input and the output. A layer can have many hundreds or even thousands of neural units. The layers which are in between the input and the output known as the hidden layers and the nodes are known as the hidden nodes.2

Figure 1: Depiction of ANN3

Basic ingredient of ANN is is the feedforward deep network, or multilayer perceptron (MLP). A multilayer perceptron is just a mathematical function mapping some set of input values to output values. The function is formed by composing many simpler functions. We can think of each application of a different mathematical function as providing a new representation of the input.4

Machine learning algorithms can be broadly categorized as unsupervised or supervised by what kind of “experience” they are allowed to have during the learning process. In these case, “experience” is called dataset. Sometimes we call them data points. Unsupervised learning algorithms experience a dataset containing many features, then learn useful properties of the structure of this dataset. Supervised learning algorithms experience a dataset containing features, but each example is also associated with a label or target.

2 Image Denoising

Image denoising is to remove noise from a noisy image, so as to restore the true image. However, since noise, edge, and texture are high frequency components, it is difficult to distinguish them in the process of denoising and the denoised images could inevitably lose some details.5

The purpose of noise reduction is to decrease the noise in natural images while minimizing the loss of original features and improving the signal-to-noise ratio (SNR). The major challenges for image denoising are as follows:

  • flat areas should be smooth,
  • edges should be protected without blurring,
  • textures should be preserved, and
  • new artifacts should not be generated.

2.1 Classical denoising method

  • Spatial domain filtering: aim to remove noise by calculating the gray value of each pixel based on the correlation between pixels/image patches in the original image. Usually done by applying linear or non-linear filters (e.g. mean filtering, median filtering). Normally, spatial filters eliminate noise to a reasonable extent but suffered with image blurring, which in turn loses sharp edges.5
  • Transform Domain Filtering since the characteristics of image information and noise are different in the “transform space”, noisy image are transform to another domain and then they apply a denoising procedure on the transformed image according to the different characteristics of the image and its noise.

2.2 Machine Learning method

Denoising methods in Machine Learning (ML) usually employ convolutional neural network (CNN)-based. Some loss function is used to estimate the proximity between the denoised image \(\hat{x}\) and the ground-truth \(x\). Now, this deep neural networks have become the tool of choice for image denoising owing to their ability to learn natural image priors from image datasets.

2.3 Additive White Gaussian Noise

Additive white Gaussian noise is one of the most common types of noise. In the image denoising literature, noise is often assumed to be zero-mean additive white Gaussian noise (AWGN). We simply add a random number to each pixel. The random number has a mean \(\mu\) of zero and a certain standard deviation \(\sigma\).

2.4 Denoising performance

To evaluate the performance metrics of image denoising methods, PSNR and SSIM are used as representative quantitative measurements:

Given a ground truth image \(x\), the PSNR of a denoised image \(\hat{x}\) is defined by:

\[ PSNR(x,\hat{x})=10⋅log_{10}(\frac{255^{2}}{||x - \hat{x}||_{2}^{2}}) \]

While quantitative measurements cannot reflect the visual quality perfectly, visual quality comparisons on a set of images are necessary. Besides the noise removal effect, edge and texture preservation is vital for evaluating a denoising method.

3 Autoencoder

An autoencoder is a neural network that is trained to attempt to copy its input to its output.4 This type of ANN are becoming increasingly popular due to their ability to learn complex representations of data. Their main purpose is learning in an unsupervised manner an “informative” representation of the data.

The autoencoder first encodes the data into a lower dimensional representation, then reconstructs it back to its original form. They can be used for a variety of tasks, such as denoising, anomaly detection, feature extraction & are able to learn features from unlabeled data, becoming popular for unsupervised learning tasks.

Figure 2: Autoencoder3

How does the decoder know the original data in the first place?

Figure 3: Autoassociative multilayer perceptron6

Consider first a multilayer perceptron of the form shown in Figure 3 having \(D\) as inputs, \(D\) as output units, and \(M\) as hidden units with \(M < D\). The targets used to train the network are simply the input vectors themselves, so that the network is attempting to map each input vector onto itself. Such a network is said to form an autoassociative mapping. Since the number of hidden units is smaller than the number of inputs, a perfect reconstruction of all input vectors is not in general possible. This imperfection of reconstruction can be counter via using higher the number of neurons in the hidden layer so that the network can fit more patterns and therefore will lower the reconstruction error.

Overall, during the training, network parameters \(w\) is needed to be carefully choosen so that reconstruction error (error function) which captures the degree of mismatch between the input vectors and their reconstructions is minimized:

\[ E(w) = \frac{1}{2} \sum_{n=1}^{N} ||y(x_{n},w)-x_{n}||^{2} \]

This minimum the network performs a projection onto the \(M\)-dimensional subspace which is spanned by the first \(M\) principal components of the data.7,8 Thus, the vectors of weights which lead into the hidden units in Figure 3 form a basis set of “latent space representation”. This “reduction projection” keeps the maximum of information when encoding and, so, has the minimum of reconstruction error when decoding. Therefore, an autoencoder is in fact a generalization of Principle Component Analysis (PCA).

4 Type of Autoencoder

Following4, there exists a variety of autoencoders:

  • undercomplete autoencoder
  • sparse autoencoder
  • denoising autoencoder
  • variational autoencoder

4.1 Undercomplete Autoencoder

An autoencoder that has a smaller dimension in the bottleneck than its input dimension is called undercomplete. In normal word, undercomplete autoencoders have a smaller dimension for hidden layer compared to the input layer.

Learning an undercomplete representation forces the autoencoder to capture the most notable features of the training data.

4.2 Sparse Autoencoder

Sparse autoencoders is an autoencoder that have hidden nodes greater than input nodes. A more in-depth discussion on sparse autoencoders is presented by Goodfellow and Andrew Ng

4.3 Denoising Autoencoder (DAE)

The denoising autoencoder (DAE) is an autoencoder that uses a corrupted data point \(\hat{x}\) as input and is trained to recover the original, uncorrupted data point \(x\) as its output. A deeper discussion on denoising autoencoder is presented by Goodfellow

4.4 Variational Autoencoder (VAE)

The VAE is a form of autoencoder that leverage distribution of latent variables in latent spaces. This encodings distribution is regularised (method to avoid overfitting) during the training in order to ensure generate “good” data reconstruction.

In a nutshell, a VAE is an autoencoder whose encodings distribution is regularised during the training in order to ensure that its latent space has good properties allowing us to generate new reconstruction data. Instead of encoding an input as a single point, we encode it as a distribution over the latent space.

4.5 Application of Autoencoder

  • autoencoders as a generative model9,10
  • autoencoders for anomaly detection
  • autoencoders for classification
  • autoencoders for clustering
  • autoencoders for recommendation systems

5 Image Denoising with Autoencoder

5.1 Simple Convolutional Autoencoder

Figure 4 illustrates the architecture of simple convolutional autoencoder(cae) that will be use. Convolutional autoencoder simply extends the basic structure of the simple autoencoder(vanilla autoencoder) by changing the fully connected layers to convolution layers.

Figure 4: Basic Autoencoder11

5.2 Deep Convolutional Autoencoder

Figure 5 on the other hand, shows the deeper architecture of cae from medical domain12 that will be use to compare.

Figure 5: Deeper Autoencoder

5.3 DnCNN

Figure 6: DnCNN architecture13

5.3.1 Overview

  • treat image denoising as a plain discriminative learning problem, i.e., separating the noise from a noisy image by feed-forward CNN
  • use CNN because it is effective in increasing the capacity and flexibility for exploiting image characteristics.
  • leverage batch normalization and residual learning to capture image features and to make training faster
  • network trained can handle 3 tasks: image Denoising, single image super Resolution, and JPEG deblocking.

5.3.2 Methodology

  • The size of convolutional filters are set to be \(3×3\) and all pooling layers are removed. Therefore, the receptive field of DnCNN with depth of d should be \((2d+1)(2d+1)\)
  • For Gaussian denoising with a certain noise level, the receptive field size of DnCNN is set to 35×35 with the corresponding depth of \(17\). For other general image denoising tasks, a larger receptive field is adopted by setting the depth to be \(20\).
  • residual learning formulation is adopted to train a residual mapping: \(x = y-R(y)\)
  • 3 types of layers:
    • Conv+ReLU: For the first layer, 64 filters of size \(3×3×c\) are used to generate 64 feature maps. \(c\) = 1 for gray image and \(c\) = 3 for color image
    • Conv+BN+ReLU: for layers 2 to \((D-1)\), 64 filters of size 3×3×64 are used, and batch normalization is added between convolution and ReLU
    • Conv: for the last layer, \(c\) filters of size 3×3×64 are used to reconstruct the output.
    • Simple zero padding strategy is used before convolution which does not result in any boundary artifacts.

5.4 FFDNet

Figure 7: FFDNet architecture14

5.4.1 Overview

  • fast and flexible denoising convolutional neural network (FFDNet)
  • premise: existing discriminative denoising methods (e.g. DnCNN, etc) are limited in flexibility, and the learned model is usually tailored to a specific noise level
  • the noise level is modeled as an input and the tunable model parameters are invariant to noise level
  • removes the spatially variant noise by specifying a non-uniform noise level map

5.4.2 Methodology

Figure 8: FFDNet pseudocode

5.5 BRDNET

Figure 9: BRDNet architecture15

5.5.1 Overview

  • batch-renormalization denoising network (BRDNet)
  • BRDNet combines two networks to increase the width of BRDNet and obtain more features for image denoising.
  • uses batch renormalization to address the small mini-batch problem, and applies residual learning (RL) with skip connection to obtain clean images.
  • to reduce the computational cost, dilation convolutions are used to capture more features.

5.5.2 Methodology

Figure 10 show implementation strategy of BRDNet.

Figure 10: BRDNet implementation15

5.6 RIDNet

Our model in Figure 11 is composed of feature extraction, feature learning residual on the residual module, and reconstruction.

Figure 11: RIDNet architecture16

5.6.1 Overview

  • single-stage blind real image denoising network (RIDNet)
  • enhancement attention modules (EAM) is used to capture essential features using attention mechanism

5.7 Zero-Shot Noise2Noise

5.7.1 Overview

  • Zero-Shot Noise2Noise (ZS-N2N)
  • drawbacks of preparing clean-noisy image pairs dataset is expensive and time-consuming
  • main idea is to generate a pair of noisy images from a single noisy image (dataset-free methods) and train a small network only on this pair
  • extends Noise2Noise and Neighbour2Neighbour by enabling training on only one single noisy image
  • zero-shot: only noisy image is given
  • blind-denoising: no information of noise level

5.7.2 Methodology

  • decompose the noisy image into a pair of downsampled images
  • train a lightweight network with regularization to map one downsampled image to the other
  • denoising is the applied to test noisy image

5.8 Selected of SoTa architecture

In Table 1 we select 5 denoising method using autoencoder which have high citation including the latest approach.

Table 1: Selected Sota ML Architecture
No Year Architecture Objective Methodology Result Weakness Github
1 2018 FFDNet - fast and flexible denoising CAE
- deal with spatially variant noise
- the noise level is modeled as an input and the tunable model parameters are invariant to noise level
- ADAM as loss optimizer
- rotation and flip based data augmentation is also adopted during training
17 requires manual intervention to select high noise-level16 FFDNet
2 2020 BRDNET - increases the width rather than depth to enhance the learning ability of the denoising networks - combines two networks to increase the width of BRDNet and obtain more features extraction
- uses batch renormalization to address the small mini-batch problem, and applies residual learning (RL) with skip connection to obtain clean images
- dilation convolutions are used to capture more features
15 TBD BRDNet
3 2017 DnCNN - treat image denoising as a plain discriminative learning problem, i.e., separating the noise from a noisy image by feed-forward CNN
- leverage batch normalization and residual learning to capture image features
- based on modify VGG network
- adopt the residual learning formulation, and incorporate it with batch normalization for fast training and improved denoising performance
13 tailored to a specific noise level DnCNN
4 2019 RIDNet - incorporate feature attention in denoising - modular network comprising three main modules: feature extraction, feature learning residual module, and reconstruction 16 TBD RIDNet
5 2023 ZS-N2N - zero-shot learning with simple network
- to denoise images without any training data or a noise model or level as input
- Noise2Noise and Neighbour2Neighbour by enabling training on only one single noisy image. 18 TBD ZS-N2N

References

1.
Adrian Rosebrock. Deep learning for computer vision with python: Starter bundle. (PyImageSearch.com, 2017).
2.
Shi Dong, Ping Wang, Khushnood Abbas. A survey on deep learning and its applications. (Elsevier, Computer Science Review, 2021).
3.
Judith E. Dayhoff and James M. DeLeo. Artificial neural networks: Opening the black box. (Cancer, American Cancer Society, 2001).
4.
Ian Goodfellow and Yoshua Bengio and Aaron Courville. Deep learning. (MIT Press, 2016).
5.
Linwei Fan, Fan Zhang, Hui Fan & Caiming Zhang. Brief review of image denoising techniques. (Springer, Vis. Comput. Ind. Biomed. Art, 2019).
6.
Christopher M. Bishop. Pattern recognition and machine learning. (Springer, 2006).
7.
H. Bourlard and Y. Kamp. Auto-association by multilayer perceptrons and singular value decomposition. (Springer, Biological Cybernetics, 1988).
8.
Pierre Baldi AND Kurt Hornik. Neural networks and principal component analysis: Learning from examples without local minima. (Springer, Neural Networks, 1989).
9.
Dor Bank and Noam Koenigstein andRaja Giryes. Autoencoders. (arxiv, 2021).
10.
Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio. Generative adversarial networks. (arxiv, 2014).
11.
Santiago L. Valdarrama. Convolutional autoencoder for image denoising. (Keras Team, 2021).
12.
Lovedeep Gondara. Medical image denoising using convolutional denoising autoencoders. (IEEE, Transactions on Image Processing, 2016).
13.
Kai Zhang; Wangmeng Zuo; Yunjin Chen; Deyu Meng; Lei Zhang. Beyond a gaussian denoiser: Residual learning of deep CNN for image denoising. (IEEE,Transactions on Image Processing, 2017).
14.
Ashly Roy and P Anju and Linnet Tomy and M. Rajeswari. Recent study on image denoising using deep CNN techniques. (IEEE, 7th International Conference on Advanced Computing; Communication Systems (ICACCS), 2021).
15.
Chunwei Tian and Yong Xu and Wang meng Zuo. Image denoising using deep CNN with batch renormalization. (Elsevier, Neural Networks, 2020).
16.
Saeed Anwar, Nick Barnes. Real image denoising with feature attention. (IEEE/CVF International Conference on Computer Vision, 2019).
17.
Kai Zhang and Wangmeng Zuo and Lei Zhang. FFDNet: Toward a fast and flexible solution for CNN-based image denoising. (2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW), 2018).
18.
Youssef Mansour, Reinhard Heckel. Zero-shot Noise2Noise: Efficient image denoising without any data. (arxiv, 2023).