a hybrid image compression technique using neural network and vector quantization with dct

A Hybrid Image Compression TechniqueUsing Neural Network and VectorQuantization With DCT

Mohamed El Zorkany

Electronics Department,National Telecommunication Institute (NTI), Cairo, [email protected]

Summary. Image and video transmissions require particularly large band-width and storage space. Image compression technology is therefore essentialto overcome these problems. Practically efficient compression systems based onhybrid coding which combines the advantages of different methods of imagecoding have also being developed over the years. In this paper, different hy-brid approaches to image compression are discussed. Hybrid coding of images,in this research, deals with combining three approaches to enhance the indi-vidual methods and achieve better quality reconstructed images with highercompression ratio. In this paper A new Hybrid neural-network, vector quan-tization and discrete cosine transform compression method is presented. Thisscheme combines the high compression ratio of Neural network (NN) and Vec-tor Quantization (VQ) with the good energy-compaction property of DiscreteCosine Transform (DCT). In order to increase the compression ratio while pre-serving decent reconstructed image quality, Image is compressed using NeuralNetwork, then take the hidden layer outputs as input to re-compress it usingvector quantization (VQ), while DCT was used the code books block. Simula-tion results show the effectiveness of the proposed method. The performanceof this method is compared with the available jpeg compression technique overa large number of images, showing good performance.

1 Introduction

Day by day the use of multimedia, images and videos are rapidly increasingin a variety application. Type of technique that is used to store in multi-media data is an important although storage is bigger than ever, however itis not enough. Hence, the data compression particularly the image compres-sion plays a vital role. Image compression is a technique for image data ratereduction to save storage space. In other words, the purpose of image com-pression is to reduce the amount of data and to achieve low bit rate digitalrepresentation without perceived loss of image quality. Since it was an area

R.S. Choraś (ed.), Image Processing and Communications Challenges 5, 233Advances in Intelligent Systems and Computing 233,DOI: 10.1007/978-3-319-01622-1_28, © Springer International Publishing Switzerland 2014

234 M. El Zorkany

of interest of many researchers, many techniques have been introduced. Inthe literature different approaches of image compression have been devel-oped. Most image compression techniques are based on either statistical ap-proaches or applied transforms such as DCT and wavelet transform. JPEGstandard is based on DCT while JPEG2000 standard is based on wavelettransform [1, 2]. Most image compression methods divide image into numberof non overlapping pixel blocks, and fed as patterns for network training.Image compression is achieved by encoding the pixel blocks into the trainedweight set, which is transmitted to the receiving side for reconstruction ofthe image. But in such cases limited amount of compression is achieved sinceit exploited only the correlation between pixel within each of the trainingpatterns [3].

There are also two well-known major approaches to implement image com-pression: Neural Network (NN), Vector Quantization (VQ) technique. A Neu-ral Network is a powerful data modelling tool that is able to capture andrepresent complex input/output relationships. It can perform "intelligent"tasks similar to those performed by the human brain. Neural Networks seemto be well suited to image compression due to their massively parallel, noisesuppression, learning capabilities and distributed architecture. NNs have theability to pre-process input patterns to produce simpler patterns with fewercomponents. This compressed information preserves the full information ob-tained from the external environment. Neural Networks based techniquesprovide sufficient compression rates of the data fed to them, also securityis easily maintained. Many different training algorithms have been used asback-propagation algorithm [4, 5].

Another well-known image compression method is vector quantization. VQis a classical quantization technique from signal processing which allows themodeling of probability density functions by the distribution of prototypevectors. It was originally used for data compression. It works by dividing alarge set of points (vectors) into groups having approximately the same num-ber of points closest to them. Each group is represented by its centroid point,as in k-means and some other clustering algorithms. Vector quantization isused for lossy data compression, lossy data correction, pattern recognitionand density estimation [6, 7].

Discrete Cosine Transform (DCT) is another well-known image compres-sion method, in this method compression is accomplished by applying a lineartransform to de-correlate the image data (source encoder), quantizing the re-sulting transform coefficients (quantizer), and entropy coding the quantizedvalues (entropy encoder) [8, 9]. This paper proposes a new image compressionscheme based on Combined NN and VQ with DCT to take advantages of them.

This paper is organized as follows. A brief review of NN and VQ scheme ispresented in Section 2. DCT scheme is illustrated in Section 3. The proposedhybrid scheme for image compression is presented in Section 4. Simulationresults and discussions are given in Section 5 and finally conclusions are drawnin Section 6.

A Hybrid Image Compression Technique Using Neural Network 235

2 Neural Network Structure

Neural Network models are specified by network topology and learning al-gorithms. Network topology describes the way in which the neurons (basicprocessing unit) are interconnected and the way in which they receive inputand output. Learning algorithms specify an initial set of weights and indicatehow to adapt them during learning in order to improve network performance.A Neural Network can be defined as a "massively parallel distributed pro-cessor that has a natural propensity for storing experiential knowledge andmaking it available for use". A number of simple computational units, calledneurons are interconnected to form a network, which perform complex com-putational tasks. There are many different types from NNs, the most commontype is MLP which employed for image compression.

For basic image compression using NN, the Neural Network structure canbe illustrated in Fig. 1. Three layers, one input layer, one output layer andone hidden layer, are designed. Both input layer and output layer are fullyconnected to the hidden layer. Compression is achieved by designing the valueof K, the number of neurones at the hidden layer, less than that of neuronesat both input and output layers [10]. The input image is split up into blocksor vectors of 8×8, 4×4 or 16×16 pixels. When the input vector is referredto as N -dimensional which is equal to the number of pixels included in eachblock, all the coupling weights connected to each neurone at the hidden layercan be represented by {wji, j = 1, 2, ...K and i = 1, 2, ...N}, which can alsobe described by a matrix of K×N . From the hidden layer to the output layer,the connections can be represented by another weight matrix of N ×K.

Fig. 1. Basic structure of NN for image compression

Image compression is achieved by training the network in such a way thatthe coupling weights, {wji}, scale the input vector of N -dimension into anarrow channel of K-dimension (K < N) at the hidden layer and producethe optimum output value which makes the quadratic error between inputand output minimum. In accordance with the Neural Network structure, theoperation can be described as follows:

236 M. El Zorkany

ymi = 1/(1 + eλneti

)and neti =WT

i X (1)

Where λ ε [0,1] is a weighting factor and WTi is the ith weight that links the

ith neuron in layer m with its input X . Back propagation (BP) algorithm isemployed to optimize the parameters of the network. Its goal is to reduce themean square error (MSE),which can be expressed as:

ε =1

2Np

Np∑p=1

NM∑i=1

(di(p)− yMi (p))2 (2)

Where Np is the number of input patterns considered, NM is the numberof neurons in the output layer, and di(p) and yMi (p) are the desired andactual outputs of the ith neuron i of the output layer for the input patternp, respectively.

For basic image compression using NN, the Neural Network structure canbe illustrated in Fig. 1. Three layers, one input layer, one output layer andone hidden layer, are designed. Both input layer and output layer are fullyconnected to the hidden layer. Compression is achieved by designing the valueof K, the number of neurones at the hidden layer, less than that of neuronesat both input and output layers [10]. The input image is split up into blocksor vectors of 8×8, 4×4 or 16×16 pixels. When the input vector is referredto as N -dimensional which is equal to the number of pixels included in eachblock, all the coupling weights connected to each neurone at the hidden layercan be represented by {wji, j = 1, 2, ...K and i = 1, 2, ...N}, which can alsobe described by a matrix of K×N . From the hidden layer to the output layer,the connections can be represented by another weight matrix of N ×K.

Image compression is achieved by training the network in such a way thatthe coupling weights, {wji}, scale the input vector of N -dimension into anarrow channel of K-dimension (K < N) at the hidden layer and producethe optimum output value which makes the quadratic error between inputand output minimum. In accordance with the Neural Network structure, theoperation can be described as follows:

hj =

N∑i=1

wji · xi 1 ≤ j ≤ K (3)

for encoding and

xi =k∑j=1

wji · hj 1 ≤ i ≤ N (4)

for decoding. where xi ∈ [0, 1] denotes the normalized pixel values for greyscale images with grey levels [0, 255]. The reason of using normalized pixelvalues is due to the fact that Neural Networks can operate more efficientlywhen both their inputs and outputs are limited to a range of [0, 1].


3 Vector Quantization (VQ)

Vector quantization (VQ) is a lossy data compression method based on theprinciple of block coding. It is a fixed-to-fixed length algorithm. VQ compres-sion is highly asymmetric in processing time: choosing an optimal codebooktakes huge amounts of calculations, but decompression is lightning-fast-onlyone table lookup per vector. This makes VQ an excellent choice for datawhich once created will never change.

The principle of the VQ techniques is simple. A vector quantizer is com-posed of two operations encoder and decoder. The encoder takes an inputvector and outputs the index of the codeword that offers the lowest distor-tion. In this case the lowest distortion is found by evaluating the Euclideandistance between the input vector and each codeword in the codebook. Oncethe closest codeword is found, the index of that codeword is sent through achannel (the channel could be computer storage, communications channel,and so on). When the encoder receives the index of the codeword, it replacesthe index with the associated codeword. Fig. 2 shows a block diagram of theoperation of VQ.

VQ coding procedure often needs several epochs of clustering to obtain acodebook; K-means is one of the simplest unsupervised learning algorithmsthat solve the well known clustering problem. K-means clustering generatecodebooks from k-means models, and quantizing vectors by comparing themwith centroids in a codebook.

Fig. 2. Basic structure of vector quantization

238 M. El Zorkany

4 Discrete Cosine Transform (DCT)

Transform coding became popular mainly due to the introduction of theDiscrete Cosine transform-an efficient transform with high computational ef-ficiency and compression performance. This fact has made the DCT favorablefor still image and video coding.

The widest used commercial product that is used for the still image codingscheme is the JPEG baseline system, which has the advantage of simplecomputation, but suffers from blocking artifacts due to course quantizationof coefficients at high compression ratios.

DCT is widely used for image compression because it provides a goodenergy-compaction property. DCT is actually a cut-down version of the fastFourier transform (FFT) where only the real part of FFT is used. The DCTtransformation generally results in the signal energy being distributed amonga small set of transformed coefficients only. Given an image consisting of theNŒN pixels f(x, y), its two-dimensional DCT produces the N ×N array ofnumbers F (i, j) given by

F (i, j) =

N−1∑x=0

N−1∑y=0

4f(x, y) cos

((2x+ 1)iπ

2N

)cos

((2y + 1)jπ

2N

)(5)

where 0 ≤ i, j ≤ N − 1.

5 The Proposed Scheme

This paper proposes a new image compression scheme based on CombinedNN and VQ with DCT to take advantages of them. In order to increase thecompression ratio while preserving decent reconstructed image quality, Imageis compressed using NN as a first stage of compression. Next, the output offirst stage (compressed image) coded using vector quantization (VQ), thenthe codebooks are transformed from spatial domain to frequency domainusing DCT. So in the proposed technique first image is compressed using NNas a first compression stage then VQ technique used to create codebooks byK-means clustering technique, then DCT compression technique is used tocompress codebooks. Two different approaches were used to merge the DCTwith VQ compression technique. First approach aggregates all codebooks inone big block and applies DCT on this block. The second approach appliesDCT on each codebook alone. Fig. 3 shows the general block diagram of theproposed scheme where I can divide the proposed scheme into three stages:NN stage, VQ stage and DCT stage.

First Stage

in the proposed method, image is compressed using NN as a first stage ofcompression. For our purpose three layers feed forward Neural Network had


Fig. 3. Block diagram of the proposed scheme

been used. Input layer with 64 neurons, hidden layer with 16 neurons, andoutput layer with 64 neurons. Back propagation algorithms had been em-ployed for the training processes. To do the training input prototypes andtarget values are necessary to be introduced to the network so that the suit-able behaviour of the network could be learned. The idea behind supplyingtarget values is that this will enable us to calculate the difference between theoutput and target values and then recognize the performance function whichis the criteria of our training. For training the network, the 256×256 pixels"Standard" test images (a set of images found frequently in the literature:Lena, peppers, cameraman, lake, etc., all in uncompressed and of the samesize) had been employed.

Also for scaling purposes, each pixel value should be divided by 255 toobtain numbers between 0 and 1. In image compression technique using NN,the compression is achieved by training a Neural Network with the imageand then using the weights and the coefficients from the hidden layer as thedata to recreate the image.

The following steps summarize the first stage of proposed scheme:

• Divide the original image into 8×8 pixel blocks and reshape each one into64×1 column vector.

• Arrange the column vectors into a matrix of 64×1024.• Let the target matrix equal to the matrix in step 2.• Choose a suitable learning algorithm, and parameters to start training.• Simulate the network with the input matrix and the target matrix.• Obtain the output matrices of the hidden layer and the output layer.• Post-process them (using VQ and DCT as will show in the following

stages) to obtain the compressed image, and the reconstructed imagerespectively.

Second Stage

In the second stage of our proposed method depend on compress the com-pressed image which yield from first stage using VQ as in the following steps:

240 M. El Zorkany

• The Neural Network image is splitted into square blocks of T × T pixels,for example 4×4 or 8×8; each block is considered as a vector in a 16- or64-dimensional space, respectively to create code vectors.

• Optimize the number of required codebooks, according to the requiredcompression ratio and quality of reconstructed image.

• Apply clustering technique to create codebooks. K-means clustering isused in our method.

Third Stage

After generating codebooks from second stage using VQ, the codebooks arethen compressed using DCT. Two different approaches were studied for DCTcode books compression.

1-Integrated Code Books

In this approach, I take all codebooks from first stage which generated usingVQ and re-aggregate it to generate one big block, and then apply DCT onthis big block as follows:

Apply DCT to a sequence of values of big codebook to construct newbook X . Then, Drop a portion of high-order values from X to create X ′.To drop a portion of this value, I rearrange the DCT coefficients beginningwith the DC coefficient. This results in a column vector. Since the imageinformation (energy) is usually concentrated in the low frequency region. Thecode books can be decreased by discarding some coefficients which representhigh frequency. This is equivalent to a low pass filtering of the image. The cutoff frequency of the filter affects on the compression ratio and reconstructedimage quality. The code word index in the compressed code book is thentransmitted.

2-Discrete Code Books Compression Approach

In this approach DCT is applied on each codebook separated, using the sametechnique in first approach. finally, applied any lossless line coding techniquepefore transmit image as Huffman code or Run length encoding (RLE) whichincrease compression ratio about 1.5 to 2.

Decompress (Compressed-Image)

For decompression or decoding the compressed image:

• Read the compressed image.• Decode RLE or Huffman coding data.


• Compute two dimension Inverse Fourier transform (2D IFFT) for thisimage to convert encode the results using inverse VQ to generate com-pressed image which represents output of hidden layer of NN.

• Result image is encoded using NN as the second part of NN.

The overall compression ratio equations of proposed technique are:

Compression ratio = compression using NN × (original image size)//(new codebooks size + no. of indexes) (6)

New codebooks size = (codebooks size from VQ)××(compression percentage from DCT)

(7)

Codebooks size = (number of codebooks)× (size of each codebook) (8)

6 Simulation Results

Mean Square Error (MSE) measure and peak signal-to-noise ratio (PSNR)are used to determine the reconstructed image quality. MSE is using thefollowing equation:

MSE =1

N ·MN−1∑x=0

M−1∑y=0

[f ′(x, y)− f(x, y)]2 (9)

Where f(x, y) and f(x, y) represent the original and the reconstructed im-ages, respectively; M and N represent the image size. And the peak signal-to-noise ratio (PSNR) in dB is calculated using the following equation:

PSNR = 10× log102552

MSEdB, (10)

The size of the compressed image (SOCI) and the Compression Ratio (CR)are calculated as in equations (6) to (8).

The proposed scheme was tested by different standard test images, Lena,Pepper, cameraman, gray and color images with different sizes. Also I havetested text, medical, compressed and color images. It is obvious from PSNRresults that, proposed technique works quite well for different types of images,and the images would be decompressed quite perfect.

For gray image, Fig. 4a shows the original Lena image with size [256×256]and Fig. 4b shows the reconstructed image after decompressed the image withcompression ratio equal 8 and PSNR equal 38 dB and Fig. 4c compressionratio equal 32 and PSNR still more than 32 dB. Table 1 presents resultsof testing the new image compression procedure of standard Lena image atdifferent compression ratios. From Table 1 it is evident that high compressionratio can be achieved in our scheme which reaches to 64 with PSNR near to30 dB.

242 M. El Zorkany

Table 1. Compression ratio of Lena image

Compression Ratio PSNR in dB8 36.4016 36.0032 34.5048 32.2064 30.0096 29.60128 27.50

(a) Original Lena image (b) Compressed image(CR=8 & PSNR=36.4 dB)

(c) Compressed image(CR=64 & PSNR=30 dB)

Fig. 4. Gray Image Compression using proposed method

For color images, Fig. 5a shows the original pepper image with size[256×256×3] and Fig. 5b shows the reconstructed image after decompressedthe image with compression ratio equal 32 and PSNR still more than 30 dB.Comparing the performance of the proposed scheme with the result of stan-dard JPEG compression for Lena image in Table 2 and Fig. 6, the compres-sion ratio and PSNR obtained using proposed scheme is better than resultsof JPEG.

Table 2. Comparing the obtained results with JPEG For Lena image

CR PSNR for PSNRproposed scheme In JPEG

8 36.4 35.8816 36 33.8124 34.5 31.8432 32.2 29.5664 30 24.11


(a) Original color image (b) Compressed image(CR=32)

Fig. 5. Color Image Compression using proposed method

Fig. 6. Comparing the obtained results with JPEG

7 Conclusion

This paper has proposed a new scheme for image coding at very low bitrate. This scheme is a hybrid method that combines the high compressionratio of Neural network (NN) and Vector Quantization (VQ) with the goodenergy-compaction property of Discrete Cosine Transform (DCT). A Com-bined NN and VQ with DCT are utilized to take advantages provided byboth of them. In order to increase the compression ratio while preserving de-cent reconstructed image quality, Image is compressed using NN, then takethe hidden layer outputs as input to re-compress it using vector quantiza-tion (VQ), while DCT was used the code books block. The main aim of theproposed scheme is to achieve high compression ratio without much compro-mise in the image quality. I have compared the performance of the proposedscheme with image compression using traditional methods as JPEG, NN andsome other schemes which depend on VQ and DCT. I have presented results

244 M. El Zorkany

showing that the proposed method produce better compression ratio thanthese schemes. Moreover, I have presented results showing that the proposedalgorithm produces better image quality than the JPEG.

References

1. Fiorucci, F., Baruffa, G., Frescura, F.: Objective and subjective quality as-sessment between JPEG XR with overlap and JPEG 2000. Journal of VisualCommunication and Image Representation 23(6), 835–844 (2012)

2. Au, K.M., Law, N.F., Siu, W.C.: Unified feature analysis in JPEG and JPEG2000-compressed domains. Journal of Pattern Recognition 40, 2049–2062 (2007)

3. Li, Drew: Fundamentals of Multimedia. In: Image Compression Standards, ch.9. Prentice Hall (2003)

4. Jiang, J.: Image compression with neural networks - A survey. Signal Pro-cessing: Image Communication 14(9), 737–760 (1999)

5. Dokur, Z.: A unified framework for image compression and segmentation byusing an incremental neural network. Expert Systems with Applications 34(1),611–619 (2008)

6. Sasazaki, K., Saga, S., Maeda, J., Suzuki, Y.: Vector quantization of imageswith variable block size. Applied Soft Computing 8, 634–645 (2008)

7. Esakkirajan, S., Veerakumar, T., Murugan, V.S., Navaneethan, P.: Image Com-pression Using Hybrid Vector Quantization. International Journal of SignalProcessing 4 (2008)

8. Tseng, H., Chang, C.: A Very Low Bit Rate Image Compressor Using Trans-formed Classified Vector Quantization. Informatica 29, 335–341 (2005)

9. Robinson, J., Kecman, V.: Combining Support Vector Machine Learning Withthe Discrete Cosine Transform in Image Compression. IEEE Transactions onNeural Network 14(4) (2003)

10. Jiang, J.: Image compression with neural networks - A survey. Signal Pro-cessing: Image Communication 14(9), 737–760 (1999)

a hybrid image compression technique using neural network and vector quantization with dct

Documents